Disaster Recovery Policy

Created by Andy Robinson, Modified on Tue, 17 Feb at 2:08 PM by Andy Robinson

The full disaster recovery policy and drills are logged under Feature 210382: Add a system for testing disaster recovery for Azure client production data. Once complete, this article will be updated with the "in place" policy.


TABLE OF CONTENTS


Environment Architecture & Tenant Isolation

Tenant Segregation Model

Compucare on Azure operates within a logically segregated multi-tenant architecture hosted in Microsoft Azure UK South (primary) and UK West (disaster recovery).

Each client is provisioned with:

  • A dedicated Azure SQL Database (Live and Test), logically separated

  • Dedicated Azure Storage containers

  • Dedicated application configuration scoped to that client

No database contains data from more than one client.
No shared schema model is used.


Database Isolation

Each client database:

  • Operates independently within Azure SQL

  • Has its own backup chain

  • Has its own geo-replication relationship

  • Is restored independently of other client databases

  • Cannot query or access other client databases

Cross-database queries between client environments are not permitted.

Operational activities (restore, replication, maintenance) are performed at individual database level and cannot impact other client data.


Network Segregation

Access control is enforced using Azure SQL database-level firewall rules.

This replaced the previous server-level firewall model and provides:

  • Firewall rules scoped to individual databases

  • No inherited server-wide exposure

  • Explicit endpoint allow-listing per database

  • Reduced lateral movement risk

Access is restricted to:

  • Approved application endpoints

  • Approved VPN endpoints (where required)

  • Explicitly authorised IP ranges

There is no shared or flat network path between client databases.

 

Azure SQL Databases

Hosted in Microsoft Azure UK South and replicated to UK West.


Backups

Current Policy

  • Automated Backups: Azure SQL Database provides automatic backups by default (no action required).
  • Point-in-Time Restore Verification: Test restoring backups to verify data integrity and recovery processes (to split SQL pool).
  • Frequency: Restore verification (monthly).
  • Reporting: Store results - for distribution to clients on request.


Planned Improvement

  • Point-In-Time-Restore Verification Reporting: Added to client specific audit events - User Story 275891.
  • Routine Backup Shipping: Copies of the latest backup are shipped to the client-specific storage account - User Story 254857.
  • Restore Stored Backup Verification: Test restoring backups to verify data integrity and recovery processes (to split SQL pool) - User Story 254857.
  • Frequency: Backup shipping (weekly), restore verification (monthly).
  • Reporting: Store results - for distribution to clients on request, added to client specific audit events - User Story 254857.


Performance Monitoring and Tuning

Current Policy

  • Performance Monitoring: Use Azure SQL Analytics, Query Performance Insight, or other monitoring tools to track performance metrics.
  • Index Maintenance: Rebuild indexes (5 databases chosen to re-index per night).
  • Statistics: UpdateStats.
  • Query Optimisation: Identify and optimise long-running queries.
  • Frequency: Update Stats (daily), index maintenance (weekly), query optimisation (monthly).
  • Reporting: Log slow-running queries to the Compucare 8 team.


Planned Improvements

  • Index Maintenance: Rebuild Indexes carried out more frequently per database (re-indexing targets chosen using client audits of last re-indexing and scheduled every week consistently) - User Story 275903.
  • Alerting: Expanded Query Performance alerting alongside Database and Elastic Pool Alerts - User Story 275907.


Security Management

TBC


Database Maintenance

Current Policy

  • Full Integrity Check: Run DBCC CHECKDB.
  • Update Statistics: Ensure statistics are updated to maintain query performance.
  • Frequency: Full integrity check (monthly), Update statistics (daily).


Disaster Recovery Planning

Planned Policy

  • DR Drills: Conduct disaster recovery drills to test failover and recovery procedures - User Story 212928.
  • Review DR Plan: Update and review the "disaster recovery plan" based on drill outcomes.
  • Frequency: DR drills (annually), DR plan review (annually).


RTO / RPO Objectives

  • Recovery Time Objective (RTO): Using Azure SQL, the combination of Point in Time Restore (PITR) capabilities and real-time replication to UK West from UK South ensures that our databases experience minimal downtime. This setup allows for rapid recovery in the event of a disruption, keeping the RTO within a target of less than 15 minutes.
  • Recovery Point Objective (RPO): With our Azure SQL we adopt Point in Time Restore (PITR) capabilities with a 31-day retention period, coupled with real-time replication to UK West from UK South. This configuration minimises data loss by ensuring that data can be restored to any point within the last 31 days, targeting the RPO of less than 1 hour.


Azure Storage Accounts

  • Replication: Replicated across different geographical locations (Geo-Redundant Storage).
  • Frequency: One-time setup with periodic review (annually).


Regular Backups

  • Backup Strategy: Storage accounts are backed up to an Azure Backup Vault with a 30-day retention policy. Compucare also has soft delete enabled, allowing for user-configurable retention.
  • Frequency: As per RPO (Recovery Point Objective) requirements.


Monitoring and Alerts

  • Metrics and Logs: Enable and review metrics and logs for storage accounts to monitor usage and performance, and to detect anomalies.
  • Alerts: Set up alerts for critical metrics and events (e.g., storage capacity, transaction rates).
  • Frequency: Review of alerts and logs (weekly).


Data Integrity Checks

  • Azure Blob Storage: Use features like Azure Blob Storage's lifecycle management policies to automatically check and maintain data integrity.
  • Frequency: As per policy schedule (weekly).


Disaster Recovery Drills

  • Failover Testing: Conduct failover testing to ensure that data can be successfully replicated and accessed from the secondary region.
  • Recovery Procedures: Document and test the recovery procedures to ensure they are effective and up-to-date
  • Frequency: Drill (annually).


Geo-Replication Testing

  • Read-Access Geo-Redundant Storage (RA-GRS): Regularly test accessing data from the secondary region in read-only mode to ensure it is available.
  • Frequency: Test data access (quarterly).


Reporting Back to Clients

  • Frequency: Report of checks and results of all of the above (quarterly).


Client Access to Storage Account

  • In the unlikely event that the Streets Heaver tenant becomes inaccessible, all Streets Heaver Azure Subscriptions will enter a Disabled state. In this state, resources become read-only, allowing data to be still downloaded.
  • Streets Heaver will provide monthly Shared Access Signature (SAS) keys to authorised parties. These SAS keys grant read-only access to clients' Azure storage accounts, including the latest shipped backup. Each month, newly generated SAS keys will be distributed to authorised parties, ensuring continuous access to the storage account.
  • Clients can use these SAS keys to download their data at any time, both before and after the resources are disabled. However, in extreme scenarios, it is important to act promptly, as data will only be retained for a limited period, typically up to 90 days. After this retention period, the data may be permanently deleted.


Client Access to Geo-Replication (UK West) Database

  • Read access is available to the Compucare (and other) database(s) on request.


Client Business Continuity

  • It is recommended that a client uses Streets Heaver's Disaster Recovery Policy in conjunction with their own to build standard operating procedures which cover theoretical incidents of varying escalating severity.

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article