Detect and Correct
IT disaster recovery also defines how the systems and data that have been backed up will be recovered. This includes:
- Who and how to contact support in an emergency. This must be defined, because if the IT infrastructure goes down, it can take communications such as email and VoIP with it.
- Defined disaster levels for each level of failure in the system(s), with who should deal with each, and how.
- Deciding where to move operations until the main server(s) are running again. Usually this will be an offsite virtual or onsite physical server that keeps a copy of your server and machines sent to them during backups.
A full system test should be done regularly to validate:
- That backup (fallover) servers kick in when the primary servers go down
- That data is sent to these fallover servers the correct way (switchover)
- That these fallover servers can handle running critical operations
- That the system can restore back to the point before everything failed
- That the primary server(s) can again take over operations with the correct applications running and no data loss (failback)
- That server logs update.
- That applications run for disaster recovery have shut down.
Testing can be conducted on the physical system, or run on a virtual image of a system.
If business continuity requires that the servers cannot be shut down for testing purposes, testing can be run as a continuous stream in the background while other processes are operating (dynamic testing). Documentation should include test results, and any areas of concern identified.
With proper IT disaster recovery planning, business continuity can be maintained if a machine or application fails.