This discussion is about development of proactive and reactive measures dealing with critical data loss, and data recovery. The dialogue includes understanding risk, cost and effect, review of services and resilient mechanisms, monitoring, analysis, and investigation.
1. Understanding the Risk To Critical Data Through Risk Assessments and Management:
Risk assessment and management begin with planning. Development of a strategic assessment plan that evaluates risk, data collection, and changes in technology is essential. Planning should include the process and time frame that is involved in recovering data, and the impact that recovery period will have on the business process.
2. Improving Availability Through Implementing Cost-Justifiable Measures To Counter The Risk Of Data Loss.
Technology changes, in terms of software and hardware, can affect reliability. System availability needs to be measured in terms of cost so that financial planning can become part of rick assessment and risk management for both the short and long-term. Loss of data can be defined by not having access to data. Loss of access can be either a software failure, a hardware failure or both.
3. Reviewing All New and Changed Services and Test All Availability and Resilience Mechanisms that Impact Critical Data.
Understanding and utilizing mirroring software as part of any data recovery plan is essential. The role of the mirroring software is to synchronize recovery of the primary operating system in real-time. Risk Assessment and management planning needs to include evaluating switching mechanisms to ensure that pathways between primary and recovery systems are working and available. Monitoring of equipments workload, age, reliability, and functionality is important.
4. Monitoring, Measuring, Analysing, Reporting, and Review Data & Service Availability:
System availability makes monitoring a critical part of risk assessment and management. This included monitoring current activities, historical activities, and data usage. Changes within software should be tested and reviewed before introducing them to the global system. Testing equipment and mechanisms is also part of monitoring and should be done in real-time to produce a measurement of data transfer between the primary and recovery systems.
5. Investigating Data Loss and Data Unavailability:
Standards of software expectation, human error, and mechanical malfunction should be part of risk management’s data loss plan. Investigation revolves around current cause of loss, and mitigation of cause with a goal of gathering information to identify and prevent similar loss events.
(Picture courtesy of coolmikeol)