Need Evidential Research on Merits of Business Continuity Planning

I agree that there is a lack of real, empirical evidence of the positive value of Business Continuity in the private sector.

On the other hand, the public sector, either by directive or presumptive duty, is much further ahead in establishing continuity of operations, disaster preparedness and other risk management initiatives and plans. This is evidenced by the numerous documented plans and conducted exercises for cities, states, court systems, state run colleges, counties, and other government agencies. COOP plans and training abound and can be easily found on any internet search.

Research is desperately needed to determine the effectiveness of business continuity planning for an organization who may experience a major crisis or disaster. This can be done in terms of ROI, survivability, and/or the measured ease of recovery and reconstitution.

Lindstedt offers two very plausible approaches to researching the positive effectiveness of business continuity planning.

  1. Go to where a real, regional disaster has occurred; Speak with those businesses impacted by the event; Obtain data that identifies the types of organizations by demographic (size, shape, industry etc); Survey the business’ level of preplanning and then conduct follow up to see how well they fared their degree of survivability.
  1. In those same regions: Get copies of response to the disaster, existing plans, and measure the effectiveness of their response. Lindstedt does not offer the metrics that would be necessary to determine what ‘effectiveness’ means. I imagine with some thought, any BC planner could make that metrics list.

While I like the ideas presented and believe the results would be valuable we should consider how difficult that data might to obtain and the amount of effort necessary to conduct a thorough research project. But, it is doable.

Example: wildland fire disaster. I responded with the county IMT to the Fourmile Canyon Fire in Sept 2010. I know a few of the impacted residents and fire departments who lost headquarters and stations to this sad and devastating incident. The reconstitution efforts are enormous and many residents say they simply won’t rebuild. Although there are very few if any business impacted (as it relates to the topic of business recovery) the fire departments were dramatically affected, both the individual firefighters who lost homes and a fire station itself.

The topic of the fire is still a painful one to speak about, but the healing continues. If the Lindstedt proposal of doing research on an impacted disaster region is used then a careful and delicate approach us necessary.

So, while I like the idea of the research and believe it has value, the approach must be conducted professionally, with permission, and in a way that minimizes the reliving of the event.

References:

Lindstedt, David,  (2009) Three Part lecture series MSBC Seminar 6 Week 10 Norwich Univ.

“Maintaining & Auditing Business Continuity Programs- A Plan for a Municipality”

” Maintaining and Auditing a Business Continuity Program-

A Plan for a Municipality

February 12, 2011

by

Andrew M. Amalfitano


CONTENTS

I.  Introduction.. 3

II. Plan.. 4

Key Plan Steps: 4

1. On-going. 4

2. Awareness and Launch.. 4

3. Implement. 5

4. Considerations. 5

III. Audit. 5

Standards. 6

Audit Elements. 8

Process. 8

Identification of Individuals to be involved in the Audit. 9

Functions to be Included in Audit. 9

Audit  Approach.. 10

Documents to review… 11

Audit Instrument. 12

Correcting Shortcomings. 12

III.  Conclusion.. 13

Appendix A – Continuity Assistance Tool (CAT). 14

Appendix B: Plan Maintenance Example: National Center of State Courts. 16

References. 17

I.  Introduction

A well developed, dusty plan sitting on a shelf does not ensure the City will be ready to weather a major crisis or disaster. To really be ready, the City must maintain current plans, keep people trained and informed, and exercise those plans on a periodic basis.

Scheduled, informal reviews[1] and annual, independent audits[2] are recommended and can significantly improve the overall readiness of the City. Two plans in particular that must be maintained in a current and effective condition are the Continuity of Operations Plan-COOP and the Emergency Operations Plan-EOP.

Maintaining Continuity of Operations and Emergency Operation Plans can help ensure that the City is ready for the unforeseen major crisis or disaster. This process includes the review, testing, and update of the plans on a regular and defined schedule.

Audits may not always be necessary, however, due to their independent nature, are often a valuable check and balance to internal plan reviews. Audits can objectively determine the adequacy of controls and level of compliance to any appropriate standards.

This document describes how to maintain continuity plans, specifically the COOP and EOP.  It describes key plan steps, suggests self-assessment instruments, and makes a case of conducting both internal reviews and external audits. It identifies the appropriate standards for the public sector, describes the audit elements and process, who should be involved, functions and documents to review, the audit approach, and how to manage shortcomings and make improvements.

II. Plan

The fundamental plan for maintaining and auditing the continuity program at a municipality is to follow the established testing, exercise, maintenance, and review process designated in the COOP plan itself.

This process includes what to review, the frequency of review and update, who is responsible for the review, and the criteria by which to determine the viability of the plan. A viable plan exists when there is proof (through training, testing, and exercises) that the plan can be implemented during a crisis or disaster and that the City’s mission essential functions can be continued successfully.

A comprehensive strategy for maintaining plans should inform the maintenance review and audit planning process. For the public sector, the establishment of a Multi-Year Strategy and Program Management Plan is recommended.[3]

Key Plan Steps:

1. On-going

a.     Take actions to revise and update plan on a periodic cycle

b.     Train new personnel and provide refresher training for others

c.      Conduct periodic exercises, follow up with corrective actions from AAR[4]

d.     Adhere to general COOP planning requirements

e.     Identify issues that may impact the COOP and drive the frequency of changes

f.       Identify the instrument(s) to be used to conduct the audit

g.      Ensure there is adequate budget and funding for exercises and plan maintenance

2. Awareness and Launch

a.     Inform those involved

b.     Get support and agreement from City functional directors

c.      Designate a review team

d.     Determine scope of the review or audit

•       A description of elements that ensure a viable COOP capability.

•       Identification of resources required to establish each element.

•       Discussion of organization-specific management and policy issues.

e.     Appoint and introduce the auditor as needed

3. Implement

a.     Begin the audit

b.     Auditor meets with designated individuals, documents specific findings, uses the identified instrument to score each function, and reports on findings.

4. Considerations

a.     Final reporting of findings

b.     Recommendations for plan maintenance improvements

c.      Identification of deficiencies and opportunities for improvement

d.     Commitment by City management to support, budget, rectify shortcomings by specific dates

e.     Scheduling of next audit

III. Audit

A continuity audit is an evaluation of a the viability, at a point in time, of the COOP and Emergency Operations in terms of people, the City as an organization, systems, processes, and functions. The audit is conducted by an independent person or entity who will focus on the business continuity and emergency operational readiness of the City based on the plan components.

There are many benefits of a continuity audit at the City. The continuity audit can provide an independent evaluation of the COOP and EOP plans and identify strengths and weakness of the program. An audit can bring to light risks inherent in the plans and suggest strategies to reduce or eliminate the risks. Finally, a thorough audit will report results that include recommendations for improvements to the plans.

An audit of the emergency management/continuity of operations plans at the City will be done in two phases. In the first phase, the Manager of the Office of Emergency Management will coordinate period reviews, report on findings, and obtain budget and direction to make improvements. Phase two will be an annual audit conducted by an independent person or organization external to the City, that is, not an employee, vendors or supplier, or any person directly affiliated with the city.

Standards

The most appropriate business continuity standards to follow for a municipality are those applicable to the public sector: NFPA 1600, FPC-65, and FEMA COOP Guidelines.

§  NFPA 1600

The NFPA 1600 standard establishes “…a common set of criteria for all hazards disaster/emergency management and business continuity programs”. [NFPA 1600]

NFPA 1600 is a very relevant standard designed to “…apply to public, not-for profit, non-governmental organizations (NGO), and private entities”. [NFPA] The standard addresses program improvement and provides a self-assessment tool  which can serve as a valuable means of performing a self-audit of the COOP plan. [NFPA]

§  FPC-65

The Federal Preparedness Circular-65, while designed for federal level agencies, suggests that states and local government develop similar continuity of operations preparedness programs that would align with the federal guidelines. As such, maintenance of the COOP should be part of a multi-year strategy and program management plan. FPC-65 includes a definition of the 11 elements that agency COOP plans and programs must contain to be considered viable. When auditing a COOP plan, each of these 11 elements should be evaluated and assessed”. [US DHS-audit forum 2007]

§  FEMA Continuity of Operations Plan Guidelines

The COOP training provided by FEMA is part of the Continuity Excellence series. One of the fundamental aspects of the training describes the importance of testing, exercises, after action reporting, corrective action and improvements. These elements constitute direction on how to best keep updated plans, and  maintain and improve agency readiness. [FEMA]

In addition, there are other standards that should be reviewed for their applicability to the ‘business’ of the City.

These may include the following:

Standard Applies to this Function
Department of Homeland Security and Federal Emergency Management Agency (DHS/FEMA), Federal Continuity Directive 1 and Federal Continuity Directive 2 COOP Plan
Health Insurance Portability and Accountability Act (HIPAA) – Regarding medical records protections Human Resources
National Institute of Standards and Technology (NIST) – “Contingency Planning Guide for Information Technology Systems”. Information Systems-IT
Federal Financial Institutions Examinations Council (FFIEC) Finance and Treasury
FEMA: National Response Framework-Incident Management System – ICS Incident Management and Emergency Operations Plan

Figure 1: Additional Standards

Audit Elements

An audit should cover a broad view of the continuity plan as well as a deep-dive into any details that demand further inspection. Typically, a more detailed review is instigated by higher level findings that elicit missing data, or are deemed inaccurate, incomplete, or suspect for any reason.

Since a COOP plan includes all essential city functions, this is the only plan that needs to be audited. However, given the criticality of emergency operations, it would be beneficial to include the Emergency Operations Plan in an audit.  Therefore, the plans to be reviewed and audited should be:

§  Continuity of Operations Plan-COOP

§  Incident Management and Emergency Operations Plan-IC/EOP

Process

The audit process can be as simple or elaborate as desired, however, simpler and shorter in duration is usually better.

The process begins with identification of those individuals to be involved with the review or audit process. This may be an individual or a team, and in the case of an audit will usually be an person external to the City.

The scope of the audit will identify which City functions, plans, and ‘territory’ will be audited. The scope should be based on applicable standards and those functions represented in the COOP or EOP plans. Any areas deemed outside of the plans should be excluded from the audit.

An approach to the audit should be established based on the goal of the audit. Since the goal of most audits is to verify the existence of proof that a plan exists and is viable,  then suitable standards should be used for comparison. The types of questions should be identified early in the process along with the instrument or tool to be used to score or rate the plans.

A list of plan elements, documents to review, and people to interview should be identified and those involved should be notified in advance.

Conducting the audit should be bounded by time and scope with a description of expectations of the auditor and all those involved. This requires good, clear communication of the intent and purpose of the audit and expected outcomes.

Finally, there should be a pre-determined description of how the results will be reported, to whom, and what action will be taken with those results. Where deficiencies are identified there should be an openness to creating and implementing corrective actions, who will be responsible, and in what time frame those improvements will be accomplished.

Identification of Individuals to be involved in the Audit

A formal, annual audit can be preceded by informal, more frequent reviews. The reviews should include conversations with either the director of each city function or a person whom they designate. During the formation of the COOP plan, each function identified a representative who developed their portion of the plan. These individuals would be ideal interviewees for the audit process, as well as, be involved in regular plan maintenance, testing and exercising of the plan, and the review process. An audit of the EOP would best be conducted by another qualified organization who also understands the nature of  emergency operations. For this City, the logical choice is the County Office of Emergency Management.

Functions to be Included in Audit

The following functions should be involved with the director of each function being responsible for plan review and audit completion:

§  Office of the City Manager

§  Buildings & Facilities

§  Community Services

§  Finance

§  Human Resources

§  Information Technology

§  Light, Power, and Communications

§  Public Safety (Police, Fire, EMS, OEM, Emergency Communications)

§  Public Works

Audit  Approach

The approach to conducting an audit should be supportive and positive with the intent of identifying opportunities for improvement. The overall goal, of course, is for the City to be operationally ready to continue mission essential functions during a crisis or disaster. The audit should support that goal.

The City Manager’s office should ensure that all departments and functions are made aware of the value of an audit and set the expectation for full cooperation. Once awareness is established, and an auditor is identified, the process should begin with a conversation and interview from the top down. The directors of each function would first be interviewed followed by a person whom they designate to represent their function. On some occasions the auditor may go beyond these two people for each function depending on what is found during the initial functional assessment.

A broad range of questions will yield an overall assessment of the general viability of the COOP and Emergency Operations Plan.

At a high level, the following types of questions should be considered:

a.     Does the COOP plan meet (as a guideline) the FPC-65 requirements?

b.     Does the EOP plan meet (as a guideline) the NFPA 1600 requirements?

c.      Do we find the specifics in each plan evident in reality? i.e. are the specifics demonstrated by adequate funding, facilities, record keeping, systems integration, trained and dedicated personnel, across all City functions?

d.     Is there adequate oversight of the COOP and EOP plans to ensure completeness and viability?

e.     Are each of the 11 elements of the COOP plan reviewed and complete?

f.       Are each of the 11 elements of the COOP plan tested and exercises at an appropriate frequency?

g.      Is there evidence of an After Action Report for each exercise and is there documentation of  corrective action follow up?

h.     Does the electronic version of documentation exist, is it backed up adequately, and can it be easily produced when asked?

i.       Are plans and individual elements up to date

With these broad and high-level questions asked, the audit can proceed into more detail as needed to gain a more full and accurate assessment of the current state of the COOP and EOP plans.

Documents to review

The key documents that should be kept up to date and reviewed periodically are those that support the mission essential functions of each city function. For the EOP, the entire plan including annexes and appendices should be included.

All 11 elements of the COOP plan may have documents and if so, all of these documents should be reviewed. In any case, the minimum document review list should be:

§  Mission Essential Functions

§  Key personnel contact information

§  Information System codes, software, keys, passwords

§  Vital records and data files

§  Critical vendor and supplier contact information

§  Building access and security documents

§  Plans: Continuity of Operations-COOP, Emergency Operations-EOP, Continuity of Government

Audit Instrument

The NFPA 1600 standard offers a suggested self-assessment instrument/tool which can be used by the City to perform a quick evaluation of the conformity to requirements of the COOP and EOP plans. That instrument can be found in the table labeled Table C.1. of Annex C of the NFPA 1600 standard. The tool allows indication of “conformity, partial conformity, or nonconformity as well as indicate evidence of conformity, corrective action, task assignment, a schedule for action, or other information in the Comments column.” [NFPA 1600 Annex C]

In addition to the NFPA tool, FEMA offers a Continuity Assistance Tool-CAT. The CAT tool provides a way to identify the strengths and weaknesses of the City continuity plan and show areas that need improvement. See Appendix ‘A’ for more details.

Correcting Shortcomings

Any review or audit process will elicit the identification of strengths and weaknesses or shortcomings. These shortcomings should be well documented with clear and concise recommendations of what actions should be taken to make improvements. Vague generalizations are not useful and should be avoided.

As part of the steering of the review or audit, the City Manager’s office should get agreement with the functional directors as to who the audience is to hear and consider the findings and take actions. As a municipality,  ultimately any citizen should be able to have visibility to the results and actions being taken to mitigate and improve the COOP and EOP plans based on the review or audit findings.

A project plan approach should be used to track and demonstrate that improvements have been implemented. Typical tracking will include a set of numbered actions, with a description of what ‘complete’ looks like, the name of the person responsible for seeing that the improvement is completed and an agreed to time frame or due date.

III.  Conclusion

This document presents a plan for maintaining the COOP and EOP plans of the City. A case is made of the benefits of conducting both a periodic internal review and an annual independent audit. A plan is proposed with key actions to be taken along with a description of the elements and approach of an audit.

The municipality as a public entity should conform with established standards from government entities, namely FEMA continuity guidelines, NFPA 1600 and others pertinent directives.

The use of suggested evaluation instruments can help bring consistency to a self-assessment and provide for a repeatable process. The document establishes the need for transparency of the findings and urges prompt and coordinated actions to fix shortcomings and institute improvements.

The end result of a proper maintenance plan and audit program will be a higher degree of assurance that the city is ready to continue mission essential functions during a crisis or disaster. This assurance can only come from a systematic and documented approach to plan maintenance that demonstrates accountability through specific actions.

Appendix A – Continuity Assistance Tool (CAT)[5]

FEMA provides a tool to help public sector organizations like the City to perform a self-evaluation of their continuity programs.

“CAT PROCESS

The process provided below is the recommended method to apply this tool:

Step 1: The continuity manager meets with functional representatives (i.e., IT manager, HR manager, Security managers, etc.) of the organization to review the CAT.

Step 2: With the assistance of the continuity manager, the functional representatives review their respective characteristics.

Answer each characteristic “Yes”, “No”, or “Not Applicable” (N/A). Flexibility is built into the assistance tool. Therefore, “Not Applicable” (N/A) may be used for those characteristics that do not apply.

Step 3: For each characteristic, a “comments” section is provided to enter any helpful notes.

Step 4: For each CMF, tally all Characteristics to obtain the “Yes”, “No”, and “N/A” CMF totals. Record this tally in the CMF header.

Step 5: Capture each CMF total in Table 2 – Continuity Management Functions Summary on page ix.”

Example: Excerpt from CAT self-assessment tool

1.6.3.6 Has the organization developed and maintained a vital records plan packet or collection that list records recovery experts or vendors? [CGC 1 Annex I, Page I-3] Yes No N/A
Comments:
1.6.3.7 Has the organization developed and maintained a vital records plan packet or collection that includes a copy of the organization’s continuity plans? [CGC 1 Annex I, Page I-3] Yes No N/A
Comments:
1.6.3.8 Has the organization reviewed its vital records plan packet or collection within the past year with the date and names of the personnel who conducted the review documented in writing to ensure that the information is current and with a copy of the review maintained at the organization’s alternate facility? [CGC 1 Annex I, Page I-3] Yes No N/A

Figure 2: FEMA Continuity Assistance Tool scoring table

Appendix B: Plan Maintenance Example: National Center of State Courts

“PLAN MAINTENANCE: The management process of keeping an organization’s Business continuity management plans up to date and effective.  Maintenance procedures are a part of this process for the review and update of the BC plans on a defined schedule.  Maintenance procedures are a part of this process. “[6]

Action Tasks Responsible Position Frequency
Update and certify the

Plan

Ÿ Review entire plan for accuracy

Ÿ Incorporate lessons learned from real-life activations of the plan and from testing and exercises

Ÿ Incorporate changes in policy and philosophy

Ÿ Manage distribution

[Name/ Position responsible] Annually
Maintain and update

Orders of Succession

and Delegations of

Authority

Ÿ Obtain current incumbents

Ÿ Update rosters and contact information

[Name/ Position] Semi-Annually
Revise checklists and

contact information for

Emergency Relocation

Team members

Ÿ Update and revise checklists

Ÿ Confirm/update information for members of the Emergency Relocation Team

All Court

Offices

Annually
Appoint new members to

the Emergency

Relocation Team

Ÿ Train new members on their responsibilities

Ÿ Integrate new members into team training

[Name/ Position] As needed
Maintain alternate

facility readiness

Ÿ Check all systems

Ÿ Verify accessibility

Ÿ Cycle supplies and equipment, as necessary

[Name/ Position] Monthly
Monitor and maintain

vital records

management program

Ÿ Monitor volume of materials

Ÿ Assist court staff with updating/removing files

All Court

Offices

Ongoing
Train new court staff Ÿ Include in new employee orientation [Name Position] Within 30 days

of appointment

Orient new policy

officials and senior

leadership

Ÿ Brief officials on existence and concepts of the COOP plan

Ÿ Brief officials on their responsibilities under the COOP plan

[Name Position] Within 30 days

of appointment

Plan and conduct

exercises

Ÿ Conduct internal COOP exercises

Ÿ Conduct joint exercises with other courts

Ÿ Conduct joint exercises with judges and staff

[Name Position] Semi-annually

As needed

References

Beard, Mike, (2010). Adding Value to the Enterprise Through Operational Project Auditing”. Institute of Internal Auditors. Retrieved 2-11-11. http://www.vbpm.org/home/wp-content/uploads/2010/08/Ops-n-Project-Auditing-IIA-Beach-Cities-2010009.pdf

Burtles, Jim, (2007). “Principles and Practices of Business Continuity- Tools and Techniques“. Chapter 12. Rothstein Associates, Connecticut

Crowe, Timothy, J. (2010). “Evaluating Continuity of Operations Plans and Programs“. Virginia US Department of Veterans Affairs/Office of Inspector General. Retrieved 2-12-11: http://www.floridaauditforum.org/files/meeting/2010_02/Crowe_Evaluating%20COOPs.pdf

DHS-FEMA, (2004). “Federal Preparedness Circular, FPC-65”. Retrieved 2-11-11: http://www.fema.gov/pdf/library/fpc65_0604.pdf

FEMA, (2009). “Train the Trainer Instructor Guide E/L 550“. Continuity Planners Workshop. Chapter 7 Corrective Action Planning

FEMA, (2009). “Continuity Assistance Tool (CAT)- Continuity Assistance for Non-Federal Entities (States, Territories, Tribal, and Local Government Jurisdictions and Private Sector Organizations)“.  Retrieved 2-11-11: http://www.fema.gov/pdf/about/org/ncp/cat.pdf

Hiles, A. (Ed.). (2007). The Definitive Handbook of Business Continuity Management. 2nd Edition. England: John Wiley & Sons

National Center for State Courts, (2007). “A Comprehensive Emergency Management Program-Part III, Appendix A”.

NFPA, (2010). “NFPA 1600 Standard on Disaster/Emergency Management and Business Continuity Programs 2010 Edition: Annex C Self Assessment for Conformity with NFPA 1600 2010 Edition“.  Retrieved 2-1-11: http://www.nfpa.org/assets/files/PDF/NFPA16002010.pdf

North Carolina Emergency Management, (2006). “North Carolina Continuity of Operations Planning Manual“. 2nd Edition. Retrieved 2-1-11: http://www.nccrimecontrol.org/div/em/documents/COOPPlannin%20Manua%202ed.pdf

Office of Emergency Management, Boulder County Colorado, (2009). “EOP Plan”, pg 67. Retrieved 2-11-11: http://www.boulderoem.com/files/Boulder%20-%20BEOP%205-5-09.pdf

Texas Dept. of State Health, (2008). “Pandemic Influenza Annex to the Continuity of Operations (COOP) Plan“. Retrieved 2-8-11: http://www.dshs.state.tx.us/comprep/pandemic/Pandemic%20Influenza%20Annex_%20DSHS%20Agency%20Level%20COOP%20Plan.pdf

US Dept. Homeland Security, (May 2007). “Evaluating Continuity of Operations Programs-Approaches & Case Study“. NY/NJ/IGAF Conference. Retrieved 2-9-11: http://www.auditforum.org/speaker%20presentations/nynj/nynjiaf%2005%202007/crowe.pdf

Wold, Geoffrey, (2010). “How to Survive a BCM Audit“. Disaster Recovery Journal. Retrieved 2-8-11: http://www.drj.com/2010-articles/summer-2010/how-to-survive-a-bcm-audit.html

End of Document


[1] “REVIEW is the internal quality control process which looks for a practical and effective capability; it checks that nothing has been overlooked; it reviews and assesses the past and considers the future; and it takes note of changing circumstances and makes recommendations where appropriate.” [Burtles]

[2] “AUDITING is the external process which looks for evidence of compliance with policy, prudence with finance, achievement of purposes and justification of claims.” [Burtles]

[3] FEMA (2009) continuity assistance tool document.

[4] FEMA (2007) http://training.fema.gov/EMIweb/edu/docs/TopOff4_afteraction_report2007.pdf

[5] FEMA Continuity Assistance Tool (2009)

[6] National Center for State Courts, (2007).

Internal or External Auditors or Both?

Business Continuity Plan Audits can be done by internal or external individuals. There is value in each approach. In either case, the person(s) conducting the audit should be competent, impartial, and objective.

When internally done, the auditor should not be from the group and should not be responsible for any of the activities being reviewed including inputs and outputs, internal supplier or customer.

Some benefits of internal auditors include lower cost, more timely execution (they know their way around the systems and people), and quick turn around on a report.

Also, internal auditors usually are capable of providing a more frequent check point on specific portions required processes in order to maintain best practices and suggest mid-period corrective actions. This is a good idea so that there is not such a major drain on personnel and their time all at once subsequent to a more comprehensive annual audit.

External auditors are useful when whatever is being audited may require a specialist. Often, specialty functions in a company are the responsibility of just a few people and therefore there may not be any other people in the company who know what to look for or how to audit that specialty area. In that case, an external auditor may be most appropriate.

The most effective use of BC plan auditors can also be a combination of both internal and external people. By coordinating the timing and scope of these auditors, many subordinate plans can be reviewed and improved throughout the year, while the annual audit of all plans can be more comprehensive and span the collective set of plan s and how well they work together

Reference:

[1] ASIS (2009). “Organizational Resilience: Security, Preparedness, and Continuity Management Systems-Requirements with Guidance for Use“. Retrieved 2-6-11: http://www.asisonline.org/guidelines/ASIS_SPC.1-2009_Item_No._1842.pdf

Tying Bonuses to BC Plan Goals

If an auditor determines that some license was taken in reporting of status on previous audits, it should be included as a data point in the current audit.  Generally, keeping within the scope of the audit parameters, an auditor can identify non-conformance as factual, regardless of previous attempts to smooth over data or report more readiness and adherence to requirements than was actually present.

If personal or management performance goals include business continuity plan conformance to standards and bonuses are paid out on meeting such goals, then the situation can become a bit dicey. Nevertheless, past activity and reporting should not influence the current audit(or) process.

Since plan auditing can be an iterative process consideration should be given to change management including a review of how the performance objectives are ties to the bonus structure. I don’t believe it is the job of the auditor to suggest performance objective changes as this flies in the way of objectivity.

In a recent corporate governance audit prep (the actual audit was performed by internal auditors) which I performed on a global consumer manufacturing business, I found an openness to understanding the process and making improvements. Most lacking was simple documentation of some very good plans, processes and procedures. Human resources played an important role in managing the audit preparation and setting the tone of expectations.

Transparency: One AAR

When documenting the results and outcomes of a disaster exercise only one report is necessary and that is the After Action Report. In it there is ample opportunity to record, exhibit, explain, and present all findings including improvement opportunities.

If a manager asked that there be two separate AAR’s, one internal and one external, I would make a strong case that two reports is not appropriate. In the municipal arena, the HSEEP calls for one AAR.

An AAR should never be ‘sanitized’ for auditors and external. The whole purpose of conducting an audit is to bring a level of transparency to the quality of programs, in this case the disaster preparedness of the agency.

Anything less than full transparency in the AAR would be unprofessional, inappropriate and borders on malfeasance.

Executive Presentations (of Exercise Results)

[Good] Corporate executives have long had a reputation of wanting crisp answers to specific questions. They also appreciate a fine blend of strategic thinking mixed with data-driven recommendations. When we conduct a disaster exercise, we presumably already have the buy-in of the champions. However, not everyone who will sit in on the executive presentation of findings is necessarily a supporter. Therefore, the exercise program manager must carefully craft and honest and meaningful presentation that is clearly describes the key results. Those results should directly tie to corporate goals.

In a typical 15 minutes executive presentation I’d keep it to about four or five slides (or pages, or whatever, depending on the media used to present).  Here are the key points to cover in the disaster exercise after-action presentation:

Slide One: Set the Tone

§  The team successfully exercised three key objectives

§  We found areas for improvement
(emphasize the great value in finding this out in practice and not during a real crisis)

Slide Two: Key Objectives

§  High level overview focused on key objectives and outcomes

§  Provide fact-based, comparison to benchmarking and industry standards

(no editorials; use action oriented verbs, like “describe, implement, conduct, assess, etc)


Slide Three: Recommendations

§  Succinctly state the lessons learned focused on key objectives

§  List recommendations as actions and why

(to company audit results, key company initiatives, etc., de-politicize, focus on improvements)

Slide Four: Call to Action

§  Ask for next steps, authorization to do follow-up with functions

§  Plan for next exercise

§  Re commit by team to continue program with benefits to company

Avoid Editorial Opinion in Your AAR

When an After Action Report is developed it should be based on facts observed and discovered during the exercise. Sometimes, editorials and opinions make their way into the feedback and documented findings that are used to create the AAR. These reports should never be altered and should be retained as originally submitted. However, not every opinion needs to be nor should be included in the official AAR.

This is especially true when it is discovered that many issues raised are tied to a particular senior person in the organization. If these issues are brought out in the exercise injects, debrief, and written evaluations, then it is up to the facilitator who is developing the AAR to determine how best to manage this situation.

I have found that there is either very good reasons for the comments or that the comments are complete bunk, as in the case where the senior person is disliked or has taken a hard stand and individuals have used the exercise as an opportunity to ‘get back’.

In the former case, when there is some merit to the raised issues, I believe a one-on-one direct conversation with the senior person is appropriate. The discussion should be honest and the data should be shared. Often times a seasoned professional will very aptly manage the feedback and may go public directly and ask for help in making things better. The opposite could also happen and that could be a ‘career limiting move’ on the part of the exercise facilitator.

In the end, the person responsible for the AAR must present a fact-based set of lessons learned that are focused on the objectives and not any one individual. The collective organization owns the responsibility to make improvements and that should be the focus of the follow up.

Which BCP Standard for Your Company?

When considering which standard the BCP program at your company should be based on some look to these for consideration:

  1. ASIS SPC.1-2009, National Standard: Organizational Resilience Standard.
  2. DRII Professional Practices for Business Continuity Planners.
  3. NFPA 1600.

Since BSI has not been offered for this discussion point, I believe that NFPA 1600 has the most viability for the municipality organization. This is not simply because of the nature of emergency management in the public sector, but also because NFPA 1600 offers the widest interpretation of guidelines while still ensuring that there is some strictness in implementation (opinion).

Annex C of the NFPA 1600 provides a very useful rubric which can be used immediately, simply, and at a high level to determine the current status of an organization. [1] This self assessment for conformity can be used to identify key weaknesses and help begin a program management process that focuses on those areas in most need of improvement.

With the extremely tight budgets these days and a slow to recover economy, cities and counties are simply not able to implement the full array of guideline adherence typically found in the 10 professional practices of  DRII.

While the ASIS standard does specify threats and hazards assessment and can be very helpful to the private sector, it focuses a great deal on topic areas not particularly useful to a public sector entity, particularly a small town under 100,000 population.

[1] NFPA (Dec 2009). “NFPA 1600 Standard on Disaster/Emergency Management and Business Continuity Programs 2010 Edition. Annex C Self Assessment for Conformity with NFPA 1600, 2010 Edition“. Retrieved 1-29-11: http://www.nfpa.org/assets/files/PDF/NFPA16002010.pdf

Maintaining BC Plans

The primary issue to consider regarding business continuity plan maintenance begins with early involvement of a team of representatives from each key department or function. Early on in the process, particularly for a Continuity of Operations Plan-COOP, public sector functions throughout the city need to understand that ongoing updates and maintenance are essential to the viability of the COOP.

I’ve accomplished this mind set by introducing awareness concepts for maintenance before beginning the COOP program. I scheduled a maintenance event for immediately following completion of the first program. This essentially provided a practice for testing our system of data backup and retrieval and loading new data items such as staffing and changing responsibilities.

Once that initial mind set is in motion as described above, we turn our attention to three key issues for how to actually perform plan maintenance successfully.

1. Central

By using a software application, each department is responsible for keeping their own section of the COOP updated in real time. This common format is very useful for supporting consistency. If a BC program manager is assigned then that person can manage a central review of the software program. This is the approach we took. However, for those companies who don’t use a software application, it may be necessary to collect hard copies or Word documents with updates. In this case, (unlike the lecture from Ms Phelps) I would require a similar format and have a central folder within a designated and protected directory which is backed up regularly. Thus, a paper (or electronic) trail is established the provides a means to validate that the plan is current.

2. Consistent

In order to ensure that the plan is consistent, the BC manager must set clear expectations. The goal is to ensure that the maintenance of the plan(s) is performed in a manner that makes it an accountable, and repeatable requirement. It should be an easy to use tool and it should be easy to submit changes.

3. Clearly Defined Responsibilities

Identify the people in each function responsible for updating the plan and representing the teams during meetings, practices, and training.

The BC manager can publish a schedule, issue reminders to review updates, and institute a mandatory return receipt policy that acknowledges that those who needed to receive direction and provide input have in fact received such notice.

Maintaining and storing plans must occur according to a specific set of guidelines. Some of the pitfalls of maintaining plans is keeping them up to date, storing them and backing up the data, and if there are multiple copies for redundancy then updating all the copies can be tedious, time consuming and costly.

To ensure successful plan maintenance:

§  Identify roles and responsibilities

§  Identify which plans are to be maintained by whom and by when

§  Use an easy software application or method

§  Make remedies clear improvements can be tracked readily

Some hold the position that it’s best not to hire vendors or third parties to maintain a plan. However in my experience, it can be helpful to have an independent external person assist the BC manager or designated person responsible for maintenance. Oddly, people may respond better sometimes to a third-party than their own colleagues.

Manage “Stonewalling” During and Exercise

An actual disaster exercise can be quite dynamic. Just because the exercise design team planned the event down to the most finite detail does not mean that the event will go exactly as planned.

People make judgments based on ever changing information and data in real life and will do the same during a disaster exercise. Sometimes that can lead to participants taking short cuts to solve problems. For example, they may stonewall and buy time, respond to injects [1] with responses like “We’ve already fixed that”, “The situation wasn’t that bad”, or “We called the vendor and got what we needed”. If it’s clear that those responses are not possible, then it’s important for the exercise facilitator [2] to help get the exercise back on track.

In advance of the exercise, here are some ways to help prevent or reduce stonewalling:

§  Pre-arrange with the Simulation team [3] to check-in periodically to ensure things are going smoothly.

§  At exercise briefing, remind participants of ground rules and expectations to reinforce how the injects should be handled.

§  Be clear that issues cannot be resolved by waving away the problem with words, but only by making that phone call and actually resolving the situation.

§  Make injects realistic, valid and believable to help improve the likelihood that participants follow through on tasks and don’t shortcut the actions.

During the exercise, here are some ways to help prevent or reduce stonewalling:

§  Instruct evaluators [4] and observers [5] to listen as they watch for how key injects are being handled. They can inform the facilitator if tasks are being ignored or glossed over.

§  The Simulation team can use status boards to keep track of how injects are moving along and make corrections proactively by sending out new information to get the teams back on track.

§  The facilitator should roam, listen and observe the activity, even for table top exercises. If the facilitator hears something that might throw the team off or that the team has misconstrued a message, then the simulation team can be alerted with a suggested response to manage the course correction in the exercise flow.

Remember that the ultimate exercise goal is LEARNING! We want the participants to act and make decisions in a way that meets the exercise objectives, builds personal and team confidence, and helps assess our readiness to handle the real disaster. Everyone involved should do what it takes to reach these goals.

[1] Inject: During the course of an exercise, an inject is data or information provided to participants that must be acted on or considered as new to the scenario.

[2] Facilitator: Conducts and directs the exercise event and is ultimately responsible for its success.

[3] Simulation Team: A person or group of people who help conduct an exercise and who act as the outside world, offer and confirm information and direct the participants through the exercise.

[4] Evaluators: To assess the key injects and actions taken relative to the stated exercise objectives.

[5] Observers: Watch and listen to learn the exercise progress in order to learn or provide general feedback to the facilitator and design team.

Giving the Exercise Plan Out Too Soon

In general, it is not a good idea to give out the disaster exercise plan to the participants in advance of the event.

I can think of one situation when it would be OK to do so: If the narrative is of a broad regional basis and the exercise clock is several days into it, for example 3+ days plus.  It would be a disadvantage and deemed unfair by many to expect the participants to understand what has already taken place and pretend to account for those variables in their next steps in the exercise.

A good example is an earthquake which may produce wide, broad damage. If the exercise time stamp starting point is 72 hours post quake they would have been working on it for 3 days. It’s simply unfair to drop the participants into the scenario. They. they need time to think about it. A good solution is to offer information only 24 hours in advance by email. This gives participants some opportunity to think about it before exercise without tainting the objectives to be evaluated.

The primary reason that giving out the exercise plan in advance is a bad idea is that when provided too far in advance, the participants will begin to think through and create corrections to the injects. This results in much less of a learning experience and defeats the purpose of the simulated pressure of the event.

The exercise facilitator could and should provide the appropriate training for participants. When the event draws close, an email notification can go out and include generic information to help focus everyone on their roles and responsibilities. Also, it can include reinforcement of how the event will unfold and better prepare the participants for the exercise without actually giving away the story line of the narrative.

Disaster Exercise – Observing and Evaluating

Exercise activity is best observed and evaluated by individuals specifically assigned to the task of observation or evaluation.

A good rule of thumb is to have enough evaluators to observe each and every key activity, inject, and response by the participants. Often, it is either cost prohibitive or simply not feasible to have more than one or two evaluators. In such cases, the exercise should be designed to accommodate the number of evaluators.

The inject messages and interrupts that help change and move the scenario along should be spaced in such a way that the evaluator(s) has time to adequately observe and note each inject or key activity.

A good evaluator should above all be honest, fair, objective and a keen observer of detail while at the same time one who can interpret actions and activities as they unfold. In addition, these are some attributes of a good evaluator should be able to:

§  understand the organization and how it operates

§  be available to attend meetings

§  discern between the exercise activity and the people, and focus on the activity

§  determine if the exercise objectives are being met

§  know that if problems arise, how to inform the facilitator in a way that helps move the scenario along and help keep things ‘alive’

§  observe without getting drawn into the scenario or getting distracted

§  clearly utilize objectives, observe outcomes, and track key injects to successful conclusion.

In order for the participants to learn from the exercise in a supportive way, evaluators should refrain from making personal attacks, or singling out individuals behavior. Even singling out individuals who may have stood out doing a tasks very well can alienate others.

Comments that are helpful are those that connect the tasks performed to the objectives of the exercise. Also, the evaluators can acknowledge an issue that may have been cause by the nature of the event, the inject message design, or an inability to fully simulate a particular activity.

In the event that an evaluator primarily critiques individuals, the facilitator can do a few things to help mitigate any backlash or uneasiness. The facilitator can help diffuse the situation immediately by mentioning the issue in a broader context. Of course, a confident facilitator could counteract the individual critique by simply saying ‘we are not here to point out individual actions, we are a team and we all own every aspect of how the scenario unfolded. Therefore, let’s refrain from directing comments to individuals.”

Prior to the exercise, the evaluators should be instructed on how to provide proper feedback in a group. And if needed, subsequently, the evaluator(s) should be with by the facilitator privately to correct the approach taken by the evaluators if inappropriate.

Leading the Disaster Exercise Design Team

Exercise design team meetings are used to orient the design team members to goals and objectives, brainstorm a story narrative, and create injects into the scenario to validate the areas being assessed. Further, the design meetings are used to prepare all aspects of the exercise event including materials like participant guides, actor and simulator team instructions, and evaluation forms.

The length of each design meeting and the number of meetings needed depends on the size and complexity of the exercise. Full-scale operational exercises may take several months to prepare, whereas, a simple table-top scenario may only take a few days or meetings.

If there were only time for two design team meetings, I would approach the meeting agendas as follows:

Meeting 1:

  • Orient participants to the exercise, what is important and painting a picture of success
  • Review plan, goals objectives, describe the basic narrative
  • Brainstorm to obtain team input regarding what might happen, what would be impacted.
  • In real time, update the scenario based on modifications to the 5 buckets with new injects
  • Assign homework for the team to create 5-7 new injects

Meeting 2:

  • Review every new inject
  • Have the team validate injects and eliminate any conflicts, make it real!
  • Brainstorm what has not been covered
  • Possible suggest more homework to evolve the narrative with new modifications
  • Review final preparations for the exercise event

Exercise Injects

Table-top disaster exercises provide a safe, low-stress environment within which participants can validate policy and procedure, consider what-if scenarios, and evaluate and assess their capabilities to manage a major incident.

Key to bringing practical realism to a disaster exercise are “injects”. An inject is new data or information. The inject is provided to the participants by the facilitator, evaluator, simulation team or others. The inject continues the story began from the baseline narrative and helps move the storyline farther across in the continuum.

There are five inject categories:

§  People

§  Facilities

§  Technology

§  Mission-critical activities at risk

§  Communications

There may be other injects that are of a more broad type or nature, as well.

The design team should research the type of hazards and threats that could impact the mission critical functions, and/or may focus a particular exercise on one of the five inject categories, or both. Sources of information include:

§  Subject matter experts of important functions

§  HSEEP lessons learned (FEMA)

§  Priority injects necessary to assess specific functions can come from other departments that use that function and thus are internal customers

§  Process and standard operating procedures of the organization

§  Injects should reflect the basic goals and objectives, therefore resources associated with the objectives can be consulted

§  business impact analysis or service impact assessments

For example, if the scenario is a Emergency Sheltering Exercise, important injects would be People and Communications. In this case, injects might include citizens looking for missing loved ones, medical emergencies with evacuees at a shelter, press and media seeking information.

Keeping Momentum in an Exercise Design Team

Once an exercise design team has started its work, momentum is important. If good effort was made early on to include all the key players and commitment was confirmed, then the process should go smoothly.

Invariably, everyday schedules or interrupts get in the way or perhaps some design team members become disenchanted or become too busy with other work. When this happens the result may be lackluster creation of injects. If the missing or inadequate injects are mission-critical, then the design team has a problem that needs immediate attention. Remember the goals of injects is to challenge the exercise participants.

As team leader, here are a few suggestions to get the process back on track:

1.      Understand. Contact the members in question and simply find out what’s going on. Without knowing the reasons for missing work, no further actions will be helpful.

2.      Importance. Remind the members of the importance of their input to challenge the exercise participants.

3.      Help. Determine if there is any misunderstanding in now to go about creating injects or helping with the scenario framework. Offer help and bring clarity to the issue and restate the expectations.

4.      Recommit. Obtain a recommitment from the members including a due date.

5.      Escalate. If necessary, it’s now time to speak directly to the members manager or supervisor. Either get someone else from the same function to do the work or have the manager speak with the member and help the assignment get completed.

6.      Last resort. Go outside the function and seek help from others who have created similar injects for similar situations. There are some good online references that describe injects for similar scenarios.

This comes down to basic team leadership. Being fair and open about the desired results and getting commitment usually helps avoid the problem.

Disaster Exercises: ‘Hard’ Incident or ‘Soft’ Event?

Whether or not an exercise is ‘hard’ or ‘soft’ often depends on the type of business or organization wishing to prepare and conduct the practice event. It could also depend on recent events in their industry or geography, and current news.

Designing the narrative of a disaster exercise can be tricky. This is especially true when it comes to providing a practical and realistic scenario, both of which are necessary for success.

‘Hard’ incidents are those described by significant, physical tragedies or crisis. People can identify with a variety of expected or known hazards occurring, for example:

§  severe natural weather disaster (hurricanes, earthquakes, tornados)\

§  man-made accidents (train derailment spilling hazardous materials, plane crash, cruise liner sinking)

§  terrorist attacks (anthrax release, explosions, shootings, incendiary devices)

§  technological accidents (data center fires, equipment failures, etc).

Soft events can include non-physical crisis which can also have significant negative impact on a business, an organization, and people. For example:

§  name brand and image tarnished by ‘bad press’

§  security breach and information security leaks

§  poor customer service

§  bad product leading to massive product recalls

§  use of environmentally disruptive technology

§  employment or labor issues

§  geopolitics

§  insider trading, etc.

Both ‘hard’ and ‘soft’ incident scenarios have some similarities:

§  impetus that causes the incident to occur must be believable

§  psychological impacts may be present

§  need to follow a exercise plan and process template for the scenario to play out

§  decision-making, scenario interrupts, resource access are all necessary

There are some differences that ‘soft’ incidents don’t provide compared to ‘hard’ incidents:

  • the number of people directly involved in the actual crisis and therefore involved in the exercise may be quite small for ‘soft’ incidents (e.g. a legal battle with bad press does not typically involved more than the executives, legal counsel, perhaps the public relations department)
  • table-top scenarios are more common since the events occurring over time cannot be created without great difficulty (difficult to practice mass outcry, or actually have a breach in security)
  • simulation of contact to outside resources may be inappropriate and cause undue alarm or confusion when no real crisis exists

The ‘hard’ and realistic scenarios are often easier to understand for most, although can be difficult to believe will actually ever occur.

Key Elements of a Disaster Exercise Narrative

A well crafted disaster exercise narrative is a critical element of any disaster training or exercise process. The narrative sets the stage for the exercise scenario. It provides background information and helps participants approach the exercise as a real and plausible event. The narrative also sets the stage and puts the players at the beginning of the exercise.

Several sources can be found to provide guidance on which elements to include in the narrative. Of the lists I’ve reviewed and based on my personal experience writing narratives, I believe the following questions need to be answered. In so doing, the answers to these questions provide the participants with all that’s necessary to understand, begin, and complete the exercise.

What is or are the….

[Set up]

a.      Assumptions?

b.      Resources available during the exercise versus what needs to be simulated?

c.      Procedures enacted prior to disaster?

[Event]

d.      Hypothetical moment in time and location of the disaster?

e.      Relevant weather conditions, threat or exposure level and continuation possibility?

f.       Notification given, and was there any advance warning?

g.      Description of what has triggered the event and situation or sequence of events leading up to the disaster?

h.      Damage, or impact to the organization, function or population?

i.       Damage internal to the organization versus external?

j.       Description of the speed, strength, depth, or level of continued danger?

k.      Likelihood of spread of the event to grow beyond its current location and go regional?

[Response]

l.       Response by the organization to the disaster currently being taken?

m.    Current response by local emergency services and what else can be expected?

n.      Status of all personnel?

o.     Status and availability of alternate and backup locations, vendors and suppliers, and utilities?

[Other ]

p.      Future predictions in the recovery scenario?

q.      Other factors that might influence emergency procedures?

Collectively, a narrative can answer these questions and provide a solid and clear foundation upon which to conduct a disaster exercise.

Reference

[1] Musson, M. (2007) Choosing and Developing the Test Scenario. In P. Rothstein (ed), Disaster Recovery Testing, Exercising Your Contingency Plan. (pp 47-56) Connecticut, Rothstein Associates

Why Are We Doing This Exercise?

It can be difficult to convince an organization to take the time and incur the cost of conducting and exercise, particularly a Functional or Full-Scale exercise. I’ve sometimes heard an answer to the basic question “Why are we doing this exercise” to be “because we’ve been told we have to”. While valid, hopefully there is a better reason. Why is this important to the exercise? Because without a compelling understanding of the ‘why’ a few things occur:

1.     Participation is average and perfunctory at best

2.     The level of possible learning may be diminished matched to the level of enthusiasm or purpose

3.     The task of creating a suitable narrative based on clear objectives may be clouded resulting in a mis-matched scenario.

4.     The entire event could be (perceived) or actually become a waste of time, money, and energy.

How do we resolve this ‘have to do it’ situation?. I’ve found it valuable to introduce the topic and build an understanding of the value of the exercise by including those most closely impacted directly in the design process. There will still be objections and sometimes a little less than full participation, however, this can usually be overcome by building in some fun, creating some energy around small tasks, being understanding about workload (the project lead – exercise designer) needs to be flexible, and of course, getting some management support in a positive way. We could also make the work part of routine expectations during the exercise cycle, that is, if employees, they can earn performance merits by excelling at this activity. Further , they need to be given time to do the task without feeling they are letting go of other work without impact.

It’s surprising how quickly people can come together, build a team with purpose, and rise to the challenge. Of course, don’t forget to offer appropriate praise and help along the way. Make sure each step of the process is a mini-celebration, and ensure the after action follow up encourages lessons learned and that it is rewarded.

The next time around, the answer to the question ‘why are we doing this’ will be ‘because we know how, it’s important, and we care about our organization and want to be ready!”.

When to Simulate and Not During an Exercise

One of the questions that often arises is the value of creating a full ‘simulation team’ versus using the ‘real’ people and functions during the exercise. There are pros and cons to each. In general, an exercise designer must consider how realistic a particular event needs to be, whether there is sufficient personnel, budget and other resources, and to what extent the business or organization can manage the impact of a practice scenario. Also, if there is an operational need for outside resources, will there be a cost and are they available.

In many cases, whether or not simulations are built into the scenario narrative depends on what portion of the organization or function are being exercised. To know that, we need to develop a suitable scope and set of objectives to be accomplished during a disaster exercise. Creating the scope and objectives is one of the essential steps to ensure a successful outcome.

Here are some general questions to consider that might help an exercise designer determine how much simulation to use, where to use it, and whether to simulate at all:

A comprehensive exercise program will already have evaluated its organization’s capabilities. Referring to and updating that assessment is an important step whenever a new exercise is considered for development.

The needs assessment will identify:

• Functions most requiring rehearsal.

• Potential exercise participants.

• Existing exercise requirements and capabilities.

• Plausible hazards and the priority levels of those hazards.

1.     Is the simulation part of our overall exercise program?

2.     Have we already established and evaluated our organization’s capabilities? If so, what is the purpose of the next event? We must incorporate plausible hazards and determine the feasibility of simulating outside participants versus not.

3.     Do we have the budget, time, and resources to impact operational elements of our organization? If so, then we do not need to simulate, and instead will use the ‘real’ resources.

4.     Do we want to test, evaluate, or drill on a particular resource? If so, then that resource should not be simulated and instead should be included in the scenario script as themselves.

5.     During Orientation and Table-top exercises, it may be too cumbersome to involve resources outside our organization or outside the primary function being evaluated. In such cases, it is often better to simulate. The simulation needs to be scripted with specific responses and exceptions so that some degree of realism can be evident.

6.     Do we have a simulation team that is available to act out and respond to communications by the participants? If so, the suggestions in item 5. above apply.

If these questions (there are many more) are not answered then we cannot be sure whether to include real or simulated events. In that case, the particular activity should either be eliminated from the exercise, or played down in importance. Playing down can be accomplished by writing specific directives and providing whatever data may be necessary to keep the exercise moving forward.

Obtaining Exercise Background Data

The background or preparatory data and information for continuity exercises may come from several sources. Of primary importance to the exercise designer should be a thorough understanding of the exercise objectives, scope, and participant functions. With these key elements defined, the designed can begin seeking out information necessary to begin crafting a suitable exercise plan.

The type of information to obtain includes

a.      communication channels, frequencies, equipment

b.      computer systems, data center capabilities

c.      confirmation from observers, controllers, evaluators

d.      confirmation of scope and purpose

e.      equipment and inventory list for event

f.       functions participating and their backup plans is exercise needs to cease in mid-event

g.      maps, GIS data, contact lists

h.     names of participants

i.       physical limitations, constraints, and test environment

j.       recovery point and recovery time objectives (if for information systems exercise)

k.      remote site and backup or devolution site and personnel

l.       traffic flow, rerouting, and parking

m.    vehicle inventory for event

n.     vital records

o.      and more

Typical information may sources include:

a.      Functional head or management of organizations within scope of exercise

b.      Facilities dept.

c.      Security dept.

d.      Public Works (if municipality)

e.      Utility Dept (if municipality)

f.       CIO or data center director

g.      Law enforcement, fire department, EMS liaison

h.     Local geographic, weather, crime, data

i.       and more

By bringing together the data and information early in the design process, the exercise designer can begin to identify gaps and take actions to resolve those gaps. Actions to resolve may include eliminating that portion of the exercise for which data will not be available, creating the data, or getting a commitment from the data source to provide the information when needed.

Bench Strength for Disaster Exercise Teams

A disaster exercise team is arguably the most value ingredient to preparing an organization to manage and recover from a disaster. There are other important ingredients, for example resources, technology, tools, vendors, executive support, communications, etc. However, a team of dedicated individuals who work well together and who understand how to implement actions in support of the disaster plan strategy can be the difference between an organization recovering with minimal life safety and business impact or not.
Some organizations may believe that the only way to be prepared is to create a team which acts like a ‘well-oiled machine’ able to respond to any event with competence and handle it successfully. The individuals on that team remain intact for long periods of time and become the ones relied on to ‘save’ the company in the event of a disaster. I do not support this approach. Rather, I believe we need to cultivate a sense of preparedness throughout an organization. There can and should still be a team leader, a team of trained and skills individuals with knowledge of various critical functions. But, in addition, the process of rotating other workers through the team can help build depth and strength which provides robustness. Robustness is needed especially during times of uncertain disaster or crisis for one primary reason: the team members themselves maybe impacted by the event and not available to help the organization respond and recover.

Conducting exercises with a more broad set of individuals affords us the opportunity to cross-train for redundancy of personnel, and get an additional set of eyes on how we perform. This additional input can help us learn from the exercise event and build a sense of the larger team, as in ‘we’re all in this together’ as opposed to one designated group of people are responsible to ‘save us’.

“Carrots and Sticks Don’t” Always Motivate

Emergency response exercises are a very valuable learning tool and can produce great energy and enthusiasm, despite the degree of successful completion or task accomplishment.

In a recent motivational talk by Dan Pink he shares the concept that businesses do not always follow established science findings related to how humans are motivated. Pink provides research that indicates people often perform to a higher degree of success without an organization imposing the twentieth century ‘carrots and sticks’ approach to management. This was found to be especially true where tasks and assignments required the application of cognitive skills beyond simple and narrowly defined tasks. In the latter case, simple defined tasks with narrow objectives can be achieved with higher performance using some amount of reward and discipline.

In my experience preparing and conducting disaster drills and similar exercises, a combination of guidelines and well known goals and objectives, a solidly built team, and well orchestrated event are critical to success. I have not found financial incentive nor punitive pressure to be of any value, particularly in the public sector setting. Here, employees are often motivated to do a good job and have a high level of pride in what they do for the community. Excelling during a disaster exercise and knowing that they can do the job well is incentive enough.

Of course, there are other very important steps in preparing for and conducting a successful exercise including clearly defined goals, selection of a skilled team of individuals, securing and use of resources, debriefing for lessons learned, and documentation of after action reports, to name a few.

I think establishing a sense of ownership and continuous improvement is the strategy that can produce the desired effect: the internal customer base (executives or leaders) and the external clients (big ‘C’, i.e. citizens or paying customers) feel confident in the organization and will continue to offer support and do business with the organization.

Those who have conducted exercises and drills know that such events, like the real thing, often don’t go as planned. The plan on paper often does not directly apply on first pass and needs updating and a different approach, mainly because every disaster presents a different set of inputs and circumstances, often unpredictable. Learning from our mistakes and learning from the variety of possible nuances of an event can help us be more ready for both the next drill and a real disaster.

Bottom line: cultivate a sense of ownership, awareness, pride, and motivation to be ready for a crisis or disaster.

We Need More than RAID-1

Backups of computer data are needed more today than ever before. In addition to common human error, our data is exposed to environmental hazards, equipment failure, and malicious attacks. The idea that disk mirroring within RAID technology is sufficient by itself to secure data is not well founded in the reality of information systems.

Many believe and most rightfully so, that the advantage of RAID is that fault tolerant and redundant backup schemes exist within the technology to help prevent or predict failure, or at the very least, offer an economical way to recover as much (potentially lost) data as possible. Unfortunately, RAID technology without an optimized backup and recovery strategy (across the enterprise) can leave a company without data when it’s needed, a longer than desired recovery time objective, or worse, irretrievable data loss. The primary reason this is evident is because technology is not perfect, it will eventually fail; the architecture is not perfect and system design compromises are often necessary to control costs or access constraints.

Disk mirroring is a very commonly found version of RAID technology often called RAID-1. It costs 2-3 times single disk striping (RAID =0), has higher reliability than all but RAID-6, and has higher transfer rates and maximum I/O rates than single disk. RAID-1 It has become the data backup scheme of choice for many a business.

In another life, I was the lead reliability engineer on early versions (some say the first) of a RAID-6 subsystem technology, a product known as Iceberg. The RAID-6 claim to fame is that it is the highest reliable RAID technology. And, even with such promises of high availability,  additional backup and recovery schemes were (and are) necessary. RAID represents the technology including hardware, software, peripheral devices and the mechanisms to connect to the user community. However, a proper backup and recovery strategy considers the failure mechanisms, disaster impacts, downtime precipitators, and human factors that will inevitably interrupt the backup system.

IT System or Users Own Data Backup? Be Careful….

Users are responsible for their own backup of critical data. Which method they choose is up to them, usually. More importantly, the consequences of choosing incorrectly may pose significant financial or safety issues. Or, the results may be inconsequential depending on the nature and importance of the data.

Whether or not local hard drive data stored by users should be backed up elsewhere is a matter of IT policy decisions. Unlike the days of spinning hard drives or tape being the only available backup scheme, today (2010) we have more sophisticated means of saving important data. For example, continuous data protection (CDP) provides for instantaneous and almost infinite recovery points. Externally, cloud computing offers a cost-effective means of automatically (on pre-determined schedule) backing up your (personal, or company) data. Products like ‘CloudSwitch’, DigiData’s Two-Stage Vaulting, and dozens of other services tout value for dollar as the solutions of choice for disaster-resistant backup solutions.[1]

IT  managers and systems administrators have often warned employees to allow the IT department to manage all backups. Reasons that IT does a better job include faster, cheaper, more reliable, and less prone to single-point failures, vis-a-vis remote backup and storage across a secure internet. Company data is vulnerable to interception, corruption, misplacement, and hacking more often when individuals attempt to do their own backup. More frequently we hear of employees needing to take extra time to redo work because they didn’t back up to a main server provided by a central or hybrid computing system with IT department.

On the other hand, users complain about IT owning all the back up. They say that when they store it themselves they can get to their data whenever they want, don’t have to worry about system outages, and ‘don’t trust IT’, with their confidential data. IT departments are also known for sending out messages alerting employees to delete, save elsewhere, or otherwise lose data because the storage maximums are being reached and space needs to be freed up to avoid incurring extra hardware costs. Now enter the cloud computing mentioned earlier. A simple (yet to be perfected) solution to help alleviate both the congestion and higher cost of additional hardware. Further,  the cloud can be used to dynamically backup only what’s new and controls can guide that software policy decision over time. In other words, cloud backup can be optimized.

[1] Eran Farajun, Eran, (March 2010). “Cloud Computing Backup? Five Key Questions”. Retrieved 10-22-10

Cyber Attack: Law Enforcement Helps?

Working with law enforcement can be a help and a hindrance to routine business operations.

In the event of a suspected cyber attack or other information systems-based crime, law enforcement can play a helpful and crucial role. The degree of helpfulness may depend on the size, nature, or complexity of the intrusion or breach.

Due to the manner of investigating a crime scene, law enforcement may need to interfere with the data center operations. Two reasons for this are 1). the evidence chain of custody needs to be preserved and 2). proper forensics investigation using a variety of tools often requires careful steps to retain data and original states of equipment, hardware, and software.

Some industries are required to report all information and data breaches while a few may be considered voluntary.

I don’t have any direct experience with such cyber cases and therefore cannot offer any insights into how it does occur. I have had conversations with both law enforcement colleagues and with a couple of business people who needed to call for help due to a suspected data security breach. My take away from these discussions is that when in doubt it makes good business sense to call in the authorities as early in the process as possible. Early means as soon as the breach is suspected, particularly if the industry is regulated (i.e. financial, energy etc).

I don’t think it’s prudent to worry about ‘gee what else might they [law] find when they’re here’. While there is the possibility that an inept first responder may muck up the scene, I don’t think that is common. Today, there is good solid forensics training for investigators, FBI, Post Office and other agencies to help ensure that a proper and thorough investigation is conducted.

Also, in the event that criminal or civil charges need to brought against a perpetrator, a documented police crime scene report will be necessary. This report may also be necessary for coverage and claims to the insurance carrier.

Insurance vs BC Planning or Both?

If the cost of insurance to recover lost revenue is lower than the cost of plans and capabilities that ensure revenue continuation, an organization may be tempted to simply purchase the insurance and stop the information systems continuity planning. I think that would be a mistake in nearly all cases.

From our readings this week, particularly Parisi and Callahan(2010), I found that I would support a multipronged approach to ensuring revenue continuance[1]. That approach would include:

1.     Identify risks based on the nature of the activities of an organization

2.     Create and implement mitigation tasks, avoidance strategies, and risk reduction interventions

3.     Determine what gaps may exist or remain after mitigations, avoidance, and reduction of risks

4.     Determine what threshold of pain, vis-a-vis loss of revenue, image, IT continuity, intellectual property etc that could be withstood,

5.     Purchase insurance at an optimally low cost to provide coverage for some of those gaps relative to the established threshold.

There appear to be many exclusions within insurance policies particularly for errors and omissions and commercial general liability coverage. There is still significant debate and much more needed case law regarding tangible and intangible assets definitions. The IRS has tax law which directs the manner for getting tax relieve subsequent to disasters and that includes records reconstruction.[2] What if records cannot be reconstructed? Is there insurance for that situation? Probably not. Also, where there is a duty to defend vs. an indemnity clause a company may need to pay up front for all litigation and defense and then seek return for damages from the defendants.

I’m left with enough skepticism about insurance coverage combined with the ambiguous and contested definition of whether data and IT infrastructures are pertinent to insurance exclusions, that both insurance and a suitable and robust IT business continuity plan are both necessary.

References:

[1] Parisi Jr., Robert A. and Callahan, Nancy, (2010). “Insurance Relief” in Readings in IT Business Continuity, Norwich University, Chapter 60.

[2] US Internal Revenue Service, (2006). “Reconstructing Your Records FS-2006-7“, Retrieved 10-9-10:  http://www.irs.gov/newsroom/article/0,,id=152317,00.html

Annualized Loss Expectancy – Does it Work?

IT risk assessment (analysis) is a vital step in protecting an organization’s information infrastructure. It is defined by NIST in their risk management guide as “the process of identifying the risks to system security and determining probability of occurrence, the resulting impact, and additional safeguards that would mitigate the impact”. Essentially, risk assessment finds out what could go wrong, what could be damaged, how vulnerable is the system, and what can be done to prevent or mitigate the impact.

There are many models, algorithms, and tools used to perform the risk analysis, however, not all tools provide as accurate an assessment as needed. For example, ALE, the Annualized Loss Expectancy is the monetary loss expected in one year due to a risk and is the product of the SLE (Single Loss Expectancy) and ARO (the Annualized Rate of Occurrence). One of the drawbacks of ALE is when the ARO is around one loss per year, there can be considerable variance in the actual loss. Using a Poisson Distribution we can calculate the probability of a specific number of losses occurring in a given year. With losses ranging between 0.5 and 2.0 the probability of occurrence in any one year may vary between 2% and 60%, hardly a good way to make financial risk reduction decisions.

Here is an example from a dentist practice:

a.      Dental X-rays are now mostly digital. In some cases, without X-rays on file or the ability to take new X-rays, dental work must be postponed and that may cost the dentist lost revenue.

b.      The dental X-rays are stored on a hard drive at the office and backed up to a thumb drive which is taken home weekly by the receptionist. If either the thumb drive or the office system are infected by a virus then the X-rays could be at risk of tampering or loss.

c.      Assume backup is available within 4 hours of a disruption. If 8 patients are seen within the four-hour period and X-rays are needed for half of them (4), then four patients will not be able to get proper counsel from the dentist during their visit due to the unavailability of the X-ray system or of the X-rays on file.

d.      The loss revenue of from one canceled patient appointment is, say $150. For four patients, that is 4 x 150 = $600 for each occurrence. The hourly wage of one dental assistant and the physician may be $200 per hour. For 4 hours loss time with patients we have 4 x 200 = $800 per occurrence.

e.      Software can be purchased for use at the office and at the home where the thumb drive is used for backup at a cost of $500 per computer per year, so $1,000 annually.

1.   The Annualized Rate of Occurrence (ARO) is the likelihood of a risk occurring within a year. The risk of a virus infecting the IT system that is not well protected from intrusion following internet connection may be 80%, so the ARO is 80% or .8.

2.   The Single Loss Expectancy (SLE) is the dollar value of the loss that equals the total cost of the risk. In the case of the dentist office from ‘d’ above, the SLE is $5,600, [4 x (600+800)].

3.   The ALE is calculated by multiplying the ARO by the SLE (ARO x SLE = ALE). In this case, if it occurs four times per year, then multiply $5,600 by 0.8 to give $4,480. Therefore, the ALE is $4,480.

4.   Because the ALE is $4,480, and the cost of the software that will minimize this risk is $1,000 per year, this means that the dentist would save $3,480 per year by purchasing the software ($4,480 – $1,000 = $3,480).

I think that ALE has its place in the risk management tool kit as long as data can be determined as very accurate and the occurrence rate is more than a few per year.

IT DR Plan and Business Continuity go Hand-in-Hand?

Protecting businesses and more importantly the people who work and spend time in facilities against intrusions is an important and necessary activity. We usually think of larger organizations, universities, hospitals, government facilities, banks etc as needing protection systems. In my experience, smaller entities (under 1000) rarely have the time, motivation, money, or energy to draw on to implement elaborate protections. Many do have rudimentary preventions like badge access, redundant information backups, and password protections. Some use security cameras and might have a small backup power supply like a generator.

Depending on the nature of the business and the historical incidents (of intrusions), an organization may be more or less compelled to consider protection against intrusions in their IS Continuity Plan[example].

I think there is a strong relationship between protection and continuity, especially at the enterprise level. I’m a fan of strategic planning. As such, when considering continuity planning, it makes sense to me that the security protection systems is part of an overall continuity planning process. There may be times or situations when a security system is stand-alone due to the size or complexity of a facility. In such cases, the IT continuity plan would certainly contain a ‘chapter’ on how the IT protection is being handled and how that set of methods interfaces with the overall security infrastructure.

Often, the IT function can implement its own protection system (including data, cyber, physical, logical) without the need for integration into the larger company scheme.

However, given the choice, I am in favor of an integrated protection system. The reason for this is that trade-offs and comprises, cost-benefit analysis, and budget constraints can possibly be optimized more efficiently when a more global perspective is considered by the organization. Also, sometimes one type of prevention or mitigation can easily include wider scope without much additional cost. Examples might include physical facility access/egress installations, fire protection systems, or building design and layouts.

Should Private Sector Adopt NIMS?

After absorbing readings on protecting the information infrastructure (Platt), and a lecture and commentary (Miora) on physical security and government directives (to private sector), I’m left with that uneasy feeling that comes from government acronym overload and bureaucracy.

We know that the National Incident Management System-NIMS (DHS 2004), has been declared by the DHS as a priority for managing major incidents. NIMS is a comprehensive standardized approach to guide the public sector to respond to major emergencies and disasters with a systematic and consistent approach. The intention is that NIMS provides principles and concepts that can be jointly followed by multiple organizations, jurisdictions, and entities for the purpose of successfully preventing, responding, and recovering from disasters and major incidents

The National Response Framework -NRF (2008) is a guide and template of how the nation manages all hazards response to small and large incidents and disasters. Stated in the primary NRF document is the concept that non-governmental organizations-NGOs and other private sector entities can and do play a vital role in how the nation plans, responds and recovers from a disaster.

I hearken back to an early lesson in business continuity which says that ‘all disasters are local’. This statement was no more true than when our county recently experienced a devastating and major wildland fire (Fourmile Canyon Fire 2010). Quick highlight is that it took more than adherence to the NRF or NIMS to respond to, contain, and control this fire and I think it will take more than ‘the government’ to manage the recovery and help with restoration. I consider this phase, recovery and restoration, vital elements of the NIMS philosophy.

Therefore, I believe that while I don’t usually favor heavy government involvement in my everyday life, these major events can and do require a larger scale response from multiple resources. Utility companies are working day and night to restore power to residents within the fire containment area. Service providers, electricians, plumbers, construction companies, realtors, banks, non-profits are all busy trying to help. Much of it sounds chaotic. It was chaotic (but effective) during the height of the response and it remains so now. Case in point is the (wonderful) but disjointed fund raising in outpouring of community empathy.

If private sector organizations get up to speed and adopt national framework systems it will make preparation, response, recovery and restoration that much more efficient and effective. The most obvious program is PS-Prep, a private and public sector partnership which essentially extends much of the same guidance for systematically managed disaster response to the private sector. I do not believe this should be mandatory, but I do think that many prominent and well-funded organizations will adopt the PS-Prep system. To the extent that large companies do so, we as a nation will be better off and better prepared.

A Virus by any other name: IT and BC work together

Business continuity and disaster recovery planners should work together to understand the potential for health related pandemics and determine how combined strategies can help mitigate impacts to a business.

A virus by any other name. Viruses are introduced into a host, spread via either standard or mutated means, and intend to do harm, either biological affecting humans or electronically infecting computer systems. (Note: Some viral activity can provide benefits, as in ‘viral ideas‘ that take hold and help spread good news or ideas.)

It’s interesting to consider how a health pandemic might impact or disrupt information systems or how a cyber attack could impact humans. In the first case, we understand that workers who become sick need to stay home and not interact with people physically. However, of course, many companies now provide remote access and work from home capabilities.

In the case of cyber attack, we don’t usually associate direct impact to humans (health). However, there are other considerations, for example, people can be stressed over the feeling of intrusion into the work environment, or they may lose work time because systems are unavailable, morale can be impacted, and a general distrust of the IT department may develop – because we expect modern IT departments to understand all the hazards and ‘take care of that for us’.

Imagine how insecure we might feel if national military defense IT systems were hacked?

Now it is official: The most significant breach of U.S. military computers was caused by a flash drive inserted into a U.S. military laptop on a post in the Middle East in 2008.” [1]

As this week’s author points out, the costs of cascading events can overwhelm an otherwise ready organization. Continuity and disaster planners should consider both types of virus in their planning process.

References:

[1] Nakashima, Ellen (Aug 2010). “Defense official discloses cyber attack“. Retrieved 9-19-10: http://www.washingtonpost.com/wp-dyn/content/article/2010/08/24/AR2010082406154.html

Uses wordpress plugins developed by www.wpdevelop.com