We Need More than RAID-1

Backups of computer data are needed more today than ever before. In addition to common human error, our data is exposed to environmental hazards, equipment failure, and malicious attacks. The idea that disk mirroring within RAID technology is sufficient by itself to secure data is not well founded in the reality of information systems.

Many believe and most rightfully so, that the advantage of RAID is that fault tolerant and redundant backup schemes exist within the technology to help prevent or predict failure, or at the very least, offer an economical way to recover as much (potentially lost) data as possible. Unfortunately, RAID technology without an optimized backup and recovery strategy (across the enterprise) can leave a company without data when it’s needed, a longer than desired recovery time objective, or worse, irretrievable data loss. The primary reason this is evident is because technology is not perfect, it will eventually fail; the architecture is not perfect and system design compromises are often necessary to control costs or access constraints.

Disk mirroring is a very commonly found version of RAID technology often called RAID-1. It costs 2-3 times single disk striping (RAID =0), has higher reliability than all but RAID-6, and has higher transfer rates and maximum I/O rates than single disk. RAID-1 It has become the data backup scheme of choice for many a business.

In another life, I was the lead reliability engineer on early versions (some say the first) of a RAID-6 subsystem technology, a product known as Iceberg. The RAID-6 claim to fame is that it is the highest reliable RAID technology. And, even with such promises of high availability,  additional backup and recovery schemes were (and are) necessary. RAID represents the technology including hardware, software, peripheral devices and the mechanisms to connect to the user community. However, a proper backup and recovery strategy considers the failure mechanisms, disaster impacts, downtime precipitators, and human factors that will inevitably interrupt the backup system.

IT System or Users Own Data Backup? Be Careful….

Users are responsible for their own backup of critical data. Which method they choose is up to them, usually. More importantly, the consequences of choosing incorrectly may pose significant financial or safety issues. Or, the results may be inconsequential depending on the nature and importance of the data.

Whether or not local hard drive data stored by users should be backed up elsewhere is a matter of IT policy decisions. Unlike the days of spinning hard drives or tape being the only available backup scheme, today (2010) we have more sophisticated means of saving important data. For example, continuous data protection (CDP) provides for instantaneous and almost infinite recovery points. Externally, cloud computing offers a cost-effective means of automatically (on pre-determined schedule) backing up your (personal, or company) data. Products like ‘CloudSwitch’, DigiData’s Two-Stage Vaulting, and dozens of other services tout value for dollar as the solutions of choice for disaster-resistant backup solutions.[1]

IT  managers and systems administrators have often warned employees to allow the IT department to manage all backups. Reasons that IT does a better job include faster, cheaper, more reliable, and less prone to single-point failures, vis-a-vis remote backup and storage across a secure internet. Company data is vulnerable to interception, corruption, misplacement, and hacking more often when individuals attempt to do their own backup. More frequently we hear of employees needing to take extra time to redo work because they didn’t back up to a main server provided by a central or hybrid computing system with IT department.

On the other hand, users complain about IT owning all the back up. They say that when they store it themselves they can get to their data whenever they want, don’t have to worry about system outages, and ‘don’t trust IT’, with their confidential data. IT departments are also known for sending out messages alerting employees to delete, save elsewhere, or otherwise lose data because the storage maximums are being reached and space needs to be freed up to avoid incurring extra hardware costs. Now enter the cloud computing mentioned earlier. A simple (yet to be perfected) solution to help alleviate both the congestion and higher cost of additional hardware. Further,  the cloud can be used to dynamically backup only what’s new and controls can guide that software policy decision over time. In other words, cloud backup can be optimized.

[1] Eran Farajun, Eran, (March 2010). “Cloud Computing Backup? Five Key Questions”. Retrieved 10-22-10

Cyber Attack: Law Enforcement Helps?

Working with law enforcement can be a help and a hindrance to routine business operations.

In the event of a suspected cyber attack or other information systems-based crime, law enforcement can play a helpful and crucial role. The degree of helpfulness may depend on the size, nature, or complexity of the intrusion or breach.

Due to the manner of investigating a crime scene, law enforcement may need to interfere with the data center operations. Two reasons for this are 1). the evidence chain of custody needs to be preserved and 2). proper forensics investigation using a variety of tools often requires careful steps to retain data and original states of equipment, hardware, and software.

Some industries are required to report all information and data breaches while a few may be considered voluntary.

I don’t have any direct experience with such cyber cases and therefore cannot offer any insights into how it does occur. I have had conversations with both law enforcement colleagues and with a couple of business people who needed to call for help due to a suspected data security breach. My take away from these discussions is that when in doubt it makes good business sense to call in the authorities as early in the process as possible. Early means as soon as the breach is suspected, particularly if the industry is regulated (i.e. financial, energy etc).

I don’t think it’s prudent to worry about ‘gee what else might they [law] find when they’re here’. While there is the possibility that an inept first responder may muck up the scene, I don’t think that is common. Today, there is good solid forensics training for investigators, FBI, Post Office and other agencies to help ensure that a proper and thorough investigation is conducted.

Also, in the event that criminal or civil charges need to brought against a perpetrator, a documented police crime scene report will be necessary. This report may also be necessary for coverage and claims to the insurance carrier.

Insurance vs BC Planning or Both?

If the cost of insurance to recover lost revenue is lower than the cost of plans and capabilities that ensure revenue continuation, an organization may be tempted to simply purchase the insurance and stop the information systems continuity planning. I think that would be a mistake in nearly all cases.

From our readings this week, particularly Parisi and Callahan(2010), I found that I would support a multipronged approach to ensuring revenue continuance[1]. That approach would include:

1.     Identify risks based on the nature of the activities of an organization

2.     Create and implement mitigation tasks, avoidance strategies, and risk reduction interventions

3.     Determine what gaps may exist or remain after mitigations, avoidance, and reduction of risks

4.     Determine what threshold of pain, vis-a-vis loss of revenue, image, IT continuity, intellectual property etc that could be withstood,

5.     Purchase insurance at an optimally low cost to provide coverage for some of those gaps relative to the established threshold.

There appear to be many exclusions within insurance policies particularly for errors and omissions and commercial general liability coverage. There is still significant debate and much more needed case law regarding tangible and intangible assets definitions. The IRS has tax law which directs the manner for getting tax relieve subsequent to disasters and that includes records reconstruction.[2] What if records cannot be reconstructed? Is there insurance for that situation? Probably not. Also, where there is a duty to defend vs. an indemnity clause a company may need to pay up front for all litigation and defense and then seek return for damages from the defendants.

I’m left with enough skepticism about insurance coverage combined with the ambiguous and contested definition of whether data and IT infrastructures are pertinent to insurance exclusions, that both insurance and a suitable and robust IT business continuity plan are both necessary.

References:

[1] Parisi Jr., Robert A. and Callahan, Nancy, (2010). “Insurance Relief” in Readings in IT Business Continuity, Norwich University, Chapter 60.

[2] US Internal Revenue Service, (2006). “Reconstructing Your Records FS-2006-7“, Retrieved 10-9-10:  http://www.irs.gov/newsroom/article/0,,id=152317,00.html

Annualized Loss Expectancy – Does it Work?

IT risk assessment (analysis) is a vital step in protecting an organization’s information infrastructure. It is defined by NIST in their risk management guide as “the process of identifying the risks to system security and determining probability of occurrence, the resulting impact, and additional safeguards that would mitigate the impact”. Essentially, risk assessment finds out what could go wrong, what could be damaged, how vulnerable is the system, and what can be done to prevent or mitigate the impact.

There are many models, algorithms, and tools used to perform the risk analysis, however, not all tools provide as accurate an assessment as needed. For example, ALE, the Annualized Loss Expectancy is the monetary loss expected in one year due to a risk and is the product of the SLE (Single Loss Expectancy) and ARO (the Annualized Rate of Occurrence). One of the drawbacks of ALE is when the ARO is around one loss per year, there can be considerable variance in the actual loss. Using a Poisson Distribution we can calculate the probability of a specific number of losses occurring in a given year. With losses ranging between 0.5 and 2.0 the probability of occurrence in any one year may vary between 2% and 60%, hardly a good way to make financial risk reduction decisions.

Here is an example from a dentist practice:

a.      Dental X-rays are now mostly digital. In some cases, without X-rays on file or the ability to take new X-rays, dental work must be postponed and that may cost the dentist lost revenue.

b.      The dental X-rays are stored on a hard drive at the office and backed up to a thumb drive which is taken home weekly by the receptionist. If either the thumb drive or the office system are infected by a virus then the X-rays could be at risk of tampering or loss.

c.      Assume backup is available within 4 hours of a disruption. If 8 patients are seen within the four-hour period and X-rays are needed for half of them (4), then four patients will not be able to get proper counsel from the dentist during their visit due to the unavailability of the X-ray system or of the X-rays on file.

d.      The loss revenue of from one canceled patient appointment is, say $150. For four patients, that is 4 x 150 = $600 for each occurrence. The hourly wage of one dental assistant and the physician may be $200 per hour. For 4 hours loss time with patients we have 4 x 200 = $800 per occurrence.

e.      Software can be purchased for use at the office and at the home where the thumb drive is used for backup at a cost of $500 per computer per year, so $1,000 annually.

1.   The Annualized Rate of Occurrence (ARO) is the likelihood of a risk occurring within a year. The risk of a virus infecting the IT system that is not well protected from intrusion following internet connection may be 80%, so the ARO is 80% or .8.

2.   The Single Loss Expectancy (SLE) is the dollar value of the loss that equals the total cost of the risk. In the case of the dentist office from ‘d’ above, the SLE is $5,600, [4 x (600+800)].

3.   The ALE is calculated by multiplying the ARO by the SLE (ARO x SLE = ALE). In this case, if it occurs four times per year, then multiply $5,600 by 0.8 to give $4,480. Therefore, the ALE is $4,480.

4.   Because the ALE is $4,480, and the cost of the software that will minimize this risk is $1,000 per year, this means that the dentist would save $3,480 per year by purchasing the software ($4,480 – $1,000 = $3,480).

I think that ALE has its place in the risk management tool kit as long as data can be determined as very accurate and the occurrence rate is more than a few per year.

Uses wordpress plugins developed by www.wpdevelop.com