Disaster Recovery Policy

Created by Andy Robinson, Modified on Wed, 4 Mar at 3:29 PM by Kyle Fotherby

The full disaster recovery policy and drills are logged under Feature 210382: Add a system for testing disaster recovery for Azure client production data. Once complete, this article will be updated with the "in place" policy.

TABLE OF CONTENTS

Environment Architecture & Tenant Isolation
- Tenant Segregation Model
- Database Isolation
Azure SQL Databases
Azure Storage Accounts
Reporting Back to Clients
Client Access to Storage Account
Client Access to Geo-Replication (UK West) Database
Client Business Continuity

Environment Architecture & Tenant Isolation

Tenant Segregation Model

Compucare on Azure operates within a logically segregated multi-tenant architecture hosted in Microsoft Azure UK South (primary) and UK West (disaster recovery).

Each client is provisioned with:

A dedicated Azure SQL Database (Live and Test), logically separated
Dedicated Azure Storage containers
Dedicated application configuration scoped to that client

No database contains data from more than one client.
No shared schema model is used.

Database Isolation

Each client database:

Operates independently within Azure SQL
Has its own backup chain
Has its own geo-replication relationship
Is restored independently of other client databases
Cannot query or access other client databases

Cross-database queries between client environments are not permitted.

Operational activities (restore, replication, maintenance) are performed at individual database level and cannot impact other client data.

Network Segregation

Access control is enforced using Azure SQL database-level firewall rules.

This replaced the previous server-level firewall model and provides:

Firewall rules scoped to individual databases
No inherited server-wide exposure
Explicit endpoint allow-listing per database
Reduced lateral movement risk

Access is restricted to:

Approved application endpoints
Approved VPN endpoints (where required)
Explicitly authorised IP ranges

There is no shared or flat network path between client databases.

Azure SQL Databases

Hosted in Microsoft Azure UK South and replicated to UK West.

Backups

Current Policy

Automated Backups: Azure SQL Database provides automatic backups by default (no action required).
Point-in-Time Restore Verification: Test restoring backups to verify data integrity and recovery processes (to split SQL pool).
Frequency: Restore verification (monthly).
Reporting: Store results - for distribution to clients on request.

Planned Improvement

Point-In-Time-Restore Verification Reporting: Added to client specific audit events - User Story 275891.
Routine Backup Shipping: Copies of the latest backup are shipped to the client-specific storage account - User Story 254857.
Restore Stored Backup Verification: Test restoring backups to verify data integrity and recovery processes (to split SQL pool) - User Story 254857.
Frequency: Backup shipping (weekly), restore verification (monthly).
Reporting: Store results - for distribution to clients on request, added to client specific audit events - User Story 254857.

Performance Monitoring and Tuning

Current Policy

Performance Monitoring: Use Azure SQL Analytics, Query Performance Insight, or other monitoring tools to track performance metrics.
Index Maintenance: Table indexes re-indexed once per week.
Statistics: UpdateStats.
Query Optimisation: Identify and optimise long-running queries.
Frequency: Update Stats (daily), index maintenance (weekly), query optimisation (monthly).
Reporting: Log slow-running queries to the Compucare 8 team.

Planned Improvements

Index Maintenance: Rebuild Indexes carried out more frequently per database (re-indexing targets chosen using client audits of last re-indexing and scheduled every week consistently) - User Story 275903.

Security Management

TBC

Database Maintenance

Current Policy

Full Integrity Check: Run DBCC CHECKDB.
Update Statistics: Ensure statistics are updated to maintain query performance.
Frequency: Full integrity check (monthly), Update statistics (daily).

Disaster Recovery Planning

Planned Policy

DR Drills: Conduct disaster recovery drills to test failover and recovery procedures - User Story 212928.
Review DR Plan: Update and review the "disaster recovery plan" based on drill outcomes.
Frequency: DR drills (annually), DR plan review (annually).

RTO / RPO Objectives

Recovery Time Objective (RTO): Using Azure SQL, the combination of Point in Time Restore (PITR) capabilities and real-time replication to UK West from UK South ensures that our databases experience minimal downtime. This setup allows for rapid recovery in the event of a disruption, keeping the RTO within a target of less than 15 minutes.
Recovery Point Objective (RPO): With our Azure SQL we adopt Point in Time Restore (PITR) capabilities with a 31-day retention period, coupled with real-time replication to UK West from UK South. This configuration minimises data loss by ensuring that data can be restored to any point within the last 31 days, targeting the RPO of less than 1 hour.

Azure Storage Accounts

Replication: Replicated across different geographical locations (Geo-Redundant Storage).
Frequency: One-time setup with periodic review (annually).

Regular Backups

Backup Strategy: Storage accounts are backed up to an Azure Backup Vault with a 30-day retention policy. Compucare also has soft delete enabled, allowing for user-configurable retention.
Frequency: As per RPO (Recovery Point Objective) requirements.

Monitoring and Alerts

Metrics and Logs: Enable and review metrics and logs for storage accounts to monitor usage and performance, and to detect anomalies.
Alerts: Set up alerts for critical metrics and events (e.g., storage capacity, transaction rates).
Frequency: Review of alerts and logs (weekly).

Data Integrity Checks

Azure Blob Storage: Use features like Azure Blob Storage's lifecycle management policies to automatically check and maintain data integrity.
Frequency: As per policy schedule (weekly).

Disaster Recovery Drills

Failover Testing: Conduct failover testing to ensure that data can be successfully replicated and accessed from the secondary region.
Recovery Procedures: Document and test the recovery procedures to ensure they are effective and up-to-date
Frequency: Drill (annually).

Geo-Replication Testing

Read-Access Geo-Redundant Storage (RA-GRS): Regularly test accessing data from the secondary region in read-only mode to ensure it is available.
Frequency: Test data access (quarterly).

Reporting Back to Clients

Frequency: Report of checks and results of all of the above (quarterly).

Client Access to Storage Account

In the unlikely event that the Streets Heaver tenant becomes inaccessible, all Streets Heaver Azure Subscriptions will enter a Disabled state. In this state, resources become read-only, allowing data to be still downloaded.
Streets Heaver will provide monthly Shared Access Signature (SAS) keys to authorised parties. These SAS keys grant read-only access to clients' Azure storage accounts, including the latest shipped backup. Each month, newly generated SAS keys will be distributed to authorised parties, ensuring continuous access to the storage account.
Clients can use these SAS keys to download their data at any time, both before and after the resources are disabled. However, in extreme scenarios, it is important to act promptly, as data will only be retained for a limited period, typically up to 90 days. After this retention period, the data may be permanently deleted.

Client Access to Geo-Replication (UK West) Database

Read access is available to the Compucare (and other) database(s) on request.

Client Business Continuity

It is recommended that a client uses Streets Heaver's Disaster Recovery Policy in conjunction with their own to build standard operating procedures which cover theoretical incidents of varying escalating severity.