Skip to main content

Disaster Recovery Architecture

Services  /  Disaster Recovery Architecture

Disaster recovery architecture is the technical infrastructure that determines whether your organization can actually restore its systems after a failure and not whether a plan says it can.

Most organizations have some form of backup but most have never successfully restored from it under real failure conditions, under time pressure, with the people who are actually available at 3am on a Sunday and where the gap between having backup infrastructure and having a validated recovery capability is where the exposure lives where DR architecture is distinct from business continuity planning.

BCP covers the organizational response of who does what, in what order, communicating with whom and which DR architecture is the technical substrate that either makes the BCP executable or makes it fiction.

A BCP that states a 4 hour RTO for a system whose actual recovery time, under realistic conditions, is 14 hours, is not a plan but it is a liability which the architecture must be designed, implemented and validated to deliver the RTO the plan commits to.

This engagement designs the architecture, specifies the implementation and validates that it achieves the required performance and where we do not implement the infrastructure which your team or an infrastructure partner does that, separately and additionally we design, specify, test and validate, the separation matters where design errors cost far less to correct on paper than in deployed infrastructure.

Price Range
£18,000 to £220,000+
Design and validation only. Infrastructure implementation is separate and additional.
Duration
8 weeks to 12 months
Design phase only. Validation follows implementation which timeline depends on how fast your team or infrastructure partner implements.
Tiers
Foundation (SME) · Advanced (Mid market) · Enterprise
Scope boundary
We design the architecture and validate the implemented result. Procurement, configuration and deployment of infrastructure is out of scope and separately costed.
Frameworks
ISO 22301 · DORA Arts. 12 and 13 · NIS2 Art. 21 · UK GDPR Art. 32 · NHS DSPT Std. 9 · NCSC CAF · PRA SS1/21
Contract
Fixed price design phase. Validation phase priced separately after implementation is complete. No variable fees within agreed scope.
Critical distinctionThis engagement produces architecture designs and implementation specifications. It does not procure, configure or deploy any infrastructure. The cost of implementing the designed architecture with storage, compute, networking, licences, cloud services which is entirely separate from and additional to this engagement fee. These costs are typically 3 to 15× the design fee depending on scale.

DR implementations fail in specific, predictable ways and understanding them before you build is cheaper than discovering them during a real incident.

58% of backups fail on first recovery attempt during real incidents and this is not bad luck but it is the predictable result of specific architectural and operational failures that are well understood and entirely preventable and if they are addressed at design time rather than discovered at incident time and where each failure mode below has a specific architectural response which the response must be in the design from the start. It cannot be retrofitted after a failure reveals it is missing.

01
Backup completion ≠ backup validity
Backup jobs report success. The backup files are corrupt, incomplete or encrypted in a way the restore process cannot reverse. This is discovered only during a restore attempt. The monitoring system shows green. The actual backup is useless.
Seen in
Tape backup systems with silent write errors. Snapshot based backups where application consistency was never configured. Cloud backup agents that completed the job but failed to upload all blocks. Encrypted backups where the key rotation invalidated older backups silently.
Architectural response
Automated restore verification on a defined schedule and not checking that the job completed but actually restoring to an isolated environment and verifying data integrity and application functionality. Restore tests logged with results. Failures trigger alerts and human review, not just job retry.
02
Recovery environment diverges from production
The DR environment was configured once, months or years ago. Production has been updated where new application versions, changed configurations, additional dependencies, revised network topology. The recovery environment still reflects the original state. The application restores successfully into an environment where it cannot function.
Seen in
Application restored to DR site but requires a database version that was upgraded in production 6 months ago. Restored VMs reference network shares that were renamed. Firewall rules in DR allow connections that production’s updated security policy now blocks at the application layer.
Architectural response
Infrastructure-as-Code for the DR environment, version controlled and kept in sync with production through the same change management pipeline. DR environment configuration is a derivative of production configuration, not a separate manually maintained copy. Any production change that doesn’t update the IaC triggers a configuration drift alert.
03
Replication lag causes data corruption at failover
Asynchronous replication carries a lag where transactions committed on the primary that have not yet reached the replica. At the moment of unplanned failover, these transactions are lost. For some applications, a partial transaction state in the replica causes not just data loss but active data corruption which the database is inconsistent, not merely out of date.
Seen in
Financial systems where the replica received a debit but not the corresponding credit. ERP systems where a partially replicated order left inventory quantities inconsistent. Database clusters where the new primary promoted a replica that was 8 minutes behind, then the 8 minute gap was overwritten by the new primary’s writes before the gap was assessed.
Architectural response
RPO appropriate replication mode selection where synchronous replication where data loss is genuinely unacceptable (accepting latency cost), asynchronous with known and accepted lag for systems where partial data loss is tolerable. Application consistent snapshots for systems that cannot tolerate mid transaction failover. Pre failover consistency checks in the automated runbook before traffic is redirected.
04
Failover succeeds technically. Application does not function.
The VM starts. The database comes online. The network connections are established. The application fails to start because it cannot reach a dependency that was not part of the failover but an authentication service, a licensing server, an external API, a configuration management system, a certificate authority. The dependency was not in the DR architecture because no one mapped it.
Seen in
Application fails in DR because Active Directory replication to the DR site was configured but never validated. SaaS licencing server validates against a production IP that is now unavailable. SSL certificates tied to a hostname that resolves differently in DR. Configuration management tool (Ansible, Chef, Puppet) pulls from a production endpoint that is unreachable from the DR network.
Architectural response
Full application dependency mapping before architecture design, including every external service the application calls during startup, authentication, and normal operation. Each dependency assessed for DR behavior. Dependencies that cannot fail over must have DR specific equivalents. Recovery runbooks include dependency verification steps before the application is declared recovered.
05
DNS and network routing delay RTO beyond target
The application is running in the DR environment. Users and integrations cannot reach it because DNS TTLs have not expired, BGP route propagation has not completed or load balancer health checks have not confirmed the new endpoints. The technical recovery is complete. The RTO target is breached while waiting for network propagation.
Seen in
DNS TTL was 3600 seconds which is 1 hour of propagation time that was never accounted for in the RTO. BGP route advertisement to the DR site’s IP ranges took 45 minutes longer than estimated. Internal DNS caching in client machines and network equipment held stale records for hours after the zone was updated.
Architectural response
Network failover time must be measured and included in the RTO calculation before the RTO is committed to. DNS TTLs pre reduced to appropriate values before a known planned failover. For unplanned failover, anycast addressing or pre-staged DNS entries minimize propagation delay. BGP pre announcement and route validation tested in advance. RTO target set only after measuring actual network failover time in test conditions.
06
The DR environment itself is unavailable
The DR site or cloud region is unavailable at the same time as the primary. This is not a theoretical risk where if the disaster is regional (flood, power grid failure, civil disruption), and the DR site is in the same region, both fail simultaneously. If the DR site is a cloud region that shares infrastructure with the primary cloud region, a provider failure can take both down. Cloud availability zone failures can affect multiple zones in the same region simultaneously.
Seen in
DR data centre located 8 miles from primary and same flooding zone. Azure West Europe and North Europe sharing a fault domain in a specific failure scenario. AWS availability zone failure affecting multiple zones in us east 1 simultaneously. Private MPLS circuit between primary and DR sites routed through the same physical cable bundle.
Architectural response
DR site separation validated as genuinely independent: separate power grid supply, separate network carrier, separate geographic zone for flood/seismic/civil risk. For cloud DR, multi region with regions that do not share underlying infrastructure. Physical site selection documented and justified. Network path diversity verified to the physical carrier level, not just the logical level.
07
Ransomware has encrypted the backups
The ransomware attack encrypted production systems. The backups are also encrypted because the backup agent ran with the same credentials as the compromised account or because the backup storage was mounted and accessible from the infected network or because the attacker specifically targeted the backup infrastructure which they do because they know that destroying backup capability increases the probability of ransom payment.
Seen in
Backup repository mounted as a network drive — encrypted alongside the file server it was protecting. Backup credentials compromised in the same credential harvesting attack that preceded the ransomware deployment. Cloud backup with insufficient access controls and attacker authenticated to cloud console and deleted or encrypted all backup versions. Snapshot retention set to 7 days where attacker waited 10 days after initial access before deploying ransomware.
Architectural response
Immutable backup storage (WORM with Write Once Read Many) that cannot be modified or deleted by any account, including administrative accounts, for the defined retention period. Air gapped copies with no network path from the production environment. Backup credentials isolated from production credential stores with separate identity provider or hardware based authentication. Retention period set to exceed the attacker’s average dwell time (industry average: 21 days before ransomware deployment). Offline validation copy that the attacker cannot reach.
08
Recovery requires knowledge that isn’t in the runbook
The recovery procedure was written by someone who no longer works there or who is unavailable during the incident. The runbook says “restore the database” but not which account to use, what the target hostname is in DR, what the expected output looks like or what to do when the third step produces an error that the author knew how to handle because they had seen it before. The person executing it cannot proceed without calling someone, and the person to call is unreachable at 3am on a Sunday.
Seen in
Runbook references “the admin account” without specifying which one or where the credentials are stored. Recovery procedure assumes the operator can see a specific monitoring dashboard which is hosted on the system being recovered. Step 7 says “verify replication” without stating what tool to use, what a successful output looks like, or what to do if it is not successful.
Architectural response
Runbooks written at command level with specific commands, specific expected outputs, specific error handling for each known error state. All credentials referenced by location in a system accessible from the DR environment (not the system being recovered). All verification steps include what success looks like and what failure looks like. Runbooks executed by someone who has never seen the environment before as the final validation test where if they cannot complete it without asking questions, it is not complete. Runbooks stored in at least two locations physically separated from the primary environment.
09
Recovery time was measured in ideal conditions, not real ones
The RTO was validated during a scheduled test: the right people were available, the test was announced in advance, the test environment was clean, the team had time to read the runbook before starting and everyone knew it was a test. A real incident happens without warning, at the worst possible time, with the second-best available people, in a degraded environment where some monitoring and tooling may itself be unavailable. The measured RTO in ideal conditions is not the actual RTO.
Seen in
Scheduled recovery test achieved RTO in 3.5 hours. Real ransomware incident took 19 hours because: the incident started at 11pm, the primary on call engineer was ill, the backup engineer needed 2 hours to understand the environment, monitoring systems were partially affected so the scope of the incident took 4 hours to establish and two recovery steps failed because the environment differed from the test environment.
Architectural response
RTO targets set with explicit time budget accounting for incident detection delay, initial scope assessment, team mobilization, communication with suppliers, actual recovery execution and post recovery verification. Test programme includes unannounced exercises at non business hours with the second tier on call team. Recovery procedures validated against degraded monitoring conditions. RTO stated as the 90th percentile expected outcome, not the best case measured outcome.

What each recovery tier actually means and what it costs to implement

Recovery tiers are defined by RTO and RPO targets where the tier appropriate for a given system is determined by the Business Impact Analysis for that system which the financial, regulatory and operational consequence of downtime at each duration, every system in your environment requires a tier classification which getting the tier wrong in either direction is expensive where under investment means the RTO cannot be met; over investment means you are paying for capability you do not need and where the infrastructure implementation costs below are indicative ranges which actual costs depend on system complexity, data volumes, vendor pricing and existing infrastructure.

Tier RTO Target RPO Target Architecture Method Typical Implementation Cost (per system) Appropriate For
Tier 0
Hot Standby
< 15 minutes Zero data loss Synchronous replication to active standby. Automated failover with no manual steps. Load balancer or DNS health check triggers redirect. £40,000 to £250,000+ annually in infrastructure. Doubles the compute and storage footprint. Requires low latency network between primary and standby which typically limits DR site distance. Payment processing. Clinical systems with direct patient safety impact. Real time trading platforms. Systems where any data loss creates regulatory breach.
Tier 1
Warm Cloud
1 to 4 hours < 15 minutes Asynchronous replication to cloud hosted replica. Pre staged recovery infrastructure provisioned but not fully running. Automated provisioning completes on activation. Manual approval step before traffic redirect. £8,000 to £60,000 annually per system in cloud infrastructure and replication costs. Significant reduction from Tier 0 by accepting 1 to 4 hour RTO and minimal replication lag. Core ERP, CRM and operational systems. Primary databases supporting multiple business critical applications. Systems where 1 to 2 hours of downtime causes significant but survivable financial impact.
Tier 2
Warm Backup
4–24 hours < 4 hours Scheduled incremental backup to cloud or secondary storage. Recovery infrastructure provisioned from IaC templates on activation. Manual recovery execution from documented runbooks. £2,000 to £20,000 annually per system. Lower infrastructure cost but higher manual effort at recovery time and higher RTO exposure. Line of business applications with moderate criticality. Supporting systems whose failure affects productivity but not immediate revenue or safety. Systems with natural daily break points that define acceptable RPO.
Tier 3
Cold Backup
24 to 72 hours < 24 hours Full backup on defined schedule. No pre staged infrastructure. Recovery requires manual provisioning from scratch. Suitable only for systems whose MTPD allows multi-day downtime. £500 to £5,000 annually per system in storage costs. Lowest infrastructure cost. Highest recovery time. Most commonly selected inappropriately for systems whose actual MTPD is shorter than the RTO. Archive and historical data systems. Development and test environments. Low usage internal tools with no revenue or safety dependency. Systems with defined seasonal usage where off season downtime is acceptable.
The most expensive mistake in DR architecture is assigning systems to lower tiers than their MTPD requires to save infrastructure cost and then discovering during an incident that the 24 hour recovery time for a Tier 3 system that should have been Tier 1 has breached both the organizational MTPD and a regulatory obligation. The cost of a Tier 1 upgrade per system is modest compared to the cost of a DORA fine, an NHS Serious Incident, or 20 hours of trading platform downtime. The BIA establishes the correct tier for each system before any architecture decision is made.

Three engagement tiers with design phase fixed price and validation phase priced after implementation.

Each engagement has two phases of design (fixed price, delivered by RJV) and Validation (fixed price, scoped and agreed after the implementation is complete) where the gap between phases is implementation which is executed by your team or an infrastructure partner, outside the scope of this engagement and which the validation phase cannot begin until implementation is sufficiently complete to test, this in practice this means the end to end timeline from engagement start to validated recovery capability depends substantially on how quickly implementation proceeds which is outside our control.

Foundation Tier
Foundation DR Architecture
For organizations with up to 20 systems requiring DR coverage, a single primary site, no Tier 0 (zero to RPO) requirements and up to one regulatory framework. Typical are SMEs, single site operations, small healthcare providers, independent schools, small charities with significant data obligations. If any system requires a sub 1 hour RTO, you are in the Advanced tier.
£18,000
Design phase · fixed · VAT excl.
Design: 8 weeksValidation priced separately after implementation. Typically £6,000 to £14,000.
Phase 1 — Design (RJV delivers)
Architecture & Specification
£18,000 fixed
8 weeks
Current state DR assessment with backup infrastructure, replication, recovery procedures, last test results
System classification with tier assignment for all systems in scope (up to 20) with MTPD and BIA references for each
Full application dependency mapping for all Tier 1 to 2 systems, including startup dependencies often missed in initial assessment
Recovery architecture design with replication method, storage architecture, network failover approach, cloud platform selection if applicable
Ransomware resilience design with immutable backup specification, air gap approach, retention period calculation based on threat model
Implementation specification with technical document to the level of detail your infrastructure team can build from without interpretation
Infrastructure cost estimate with itemized estimate of annual implementation cost per system per tier, for board budget approval
1 × command level recovery runbook per Tier 1 system (up to 5 systems)
Regulatory compliance gap analysis under selected framework only
Implementation of any infrastructure
Tier 0 (zero to RPO synchronous replication) architecture
Phase 2 — Validation (after client implements)
Testing & Evidence Generation
£6,000 to £14,000 fixed
Priced after implementation is complete. 3 to 4 weeks to execute.
Implementation review and verify built infrastructure matches the design specification before any testing begins
Automated restore verification with validate backup jobs are producing restorable backups, not just completing successfully
Runbook executability test with execution by an engineer unfamiliar with the environment which identifies all undocumented knowledge dependencies
Recovery test per Tier 1 system with full restore to isolated environment, RTO measured, application functionality verified
Network failover timing measurement with DNS propagation, routing convergence with actual times included in validated RTO
All failures root cause analyzed. Remediation classified as RJV error or implementation gap. Re test after remediation.
Compliance evidence pack with test results in regulatory submission format for selected framework
Validated RTO and RPO per system with documented as achieved performance, not target performance
Testing Tier 3 systems (cold backup verification only)
Annual re validation (available separately with £5,500 per year)
Phase 1 Timeline — 8 Weeks (full client cooperation)
Wk 1
Current State Assessment
Existing backup and DR review. Infrastructure access. Last test results if any.
Common delay is no test results exist; last recovery test was years ago or never done. Does not block assessment and documents the gap.
Wk 2
System Classification
Tier assignment for all 20 systems. BIA data required as input where if BIA does not exist, assumptions documented.
If no BIA exists, tier assignments will be based on estimated impact. This introduces uncertainty into the architecture. A full BIA is strongly recommended first.
Wk 3 to 4
Dependency Mapping
Full application dependency mapping for Tier 1 to 2 systems. This takes longer than clients expect where startup dependencies are systematically underdocumented.
Dependency mapping almost always reveals systems or dependencies not initially known to be in scope. Each discovery assessed for tier impact.
Wk 5 to 6
Architecture Design
Full recovery architecture. Ransomware resilience design. Implementation specification. Infrastructure cost estimate.
Cost estimate commonly produces budget shock. Allow time for board review and approval before implementation begins.
Wk 7 to 8
Runbooks & Handover
Command level runbooks for Tier 1 systems. Compliance gap analysis. Implementation specification review with IT team.
Runbook review by IT team often reveals that the implementation specification requires clarification. Build 3 business days for Q&A after delivery.
What Your Team Provides (Phase 1)
IT lead from 8 to 12 hours across weeks 1 to 4 for current state assessment and dependency mapping interviews
Read only access to with backup infrastructure consoles, network documentation, application server configurations, cloud billing
Last available DR test results, however old and even failed tests are useful as inputs
BIA data with MTPD per system. If no BIA exists, this must be stated explicitly before the engagement begins where architecture will carry higher uncertainty
Board or senior management available for cost estimate review and implementation budget approval (week 6 to 7)
Document review within 5 business days of issue
What Is Not in This Engagement
Any infrastructure procurement, configuration or deployment which is entirely separate and additional to this fee
Tier 0 zero to RPO synchronous replication architecture which requires Advanced tier
Second or third regulatory framework compliance mapping
OT/IT convergence DR for systems with industrial control components
Runbooks for Tier 2 and Tier 3 systems (Tier 1 only, up to 5 systems)
Phase 2 validation which is separately priced and separately contracted after implementation
Annual re validation after Phase 2 from £5,500 per year
Advanced Tier
Advanced DR Architecture
For organizations with 20 to 100 systems requiring DR coverage, up to 4 sites, Tier 0 (zero RPO) requirements for some systems and up to 3 concurrent regulatory frameworks. Typical where NHS Trusts, financial services firms subject to DORA or FCA PS21/3, multi site manufacturers, universities, housing associations. If you operate more than 100 systems or across more than 4 sites then contact us to discuss whether Enterprise scope applies.
£72,000
Design phase · fixed · VAT excl.
Design: 14 weeksValidation £19,000 to £75,000 depending on systems in scope. Priced after implementation.
Phase 1 — Design (RJV delivers)
Architecture & Specification
£52,000 fixed
14 weeks
Current state assessment across all sites with backup, replication, recovery procedures, test history
Full system classification with all systems in scope (up to 100), tier assignment per system with documented justification
Tier 0 architecture design with synchronous replication topology, automated failover mechanism, latency requirements and site distance constraints
Full application dependency mapping: all Tier 0 to 2 systems, including startup, authentication and integration dependencies
Multi site failure isolation design in which site failures cascade to which others and the architectural mitigations
Ransomware resilience architecture with immutable backup, air gap design, tenant isolation, credential isolation, retention model
Cloud DR architecture with cloud platform selection and justification, regional separation requirements, network connectivity between primary and cloud DR
Full implementation specification with infrastructure team or partner can build from this document without ambiguity
Itemized infrastructure cost estimate per system per tier for all 100 systems are presented with cost benefit analysis for board approval
Command level runbooks is for all Tier 0 systems and all Tier 1 systems up to 20
Three regulatory frameworks with compliance gap analysis and evidence structure for all three simultaneously
DORA specific documentation (Arts. 12 and 13) where applicable, in regulatory submission format
Phase 2 — Validation (after client implements)
Testing & Evidence Generation
£19,000 to £75,000 fixed
Scoped after implementation. Typically 6 to 10 weeks to execute.
Implementation review against design specification before testing begins where discrepancies must be resolved or risk accepted in writing before proceeding
Automated restore verification programme is for all backup jobs validated for actual restorability on defined schedule
Runbook executability test with execution by engineer unfamiliar with the environment
Tier 0 failover test with automated failover triggered in isolated environment. RPO measured (should be zero). RTO measured. Data consistency verified.
Tier 1 recovery test is for all Tier 1 systems. Full restore to isolated cloud environment. RTO measured against target. Application functionality verified.
Multi site failure scenario under failure of one site with recovery to another network routing, DNS, application behavior all tested
Ransomware scenario exercise with simulated ransomware event, recovery from immutable backup, RTO measured from time of detection
All failures root cause classified and remediated. Re test after remediation. No system signed off until target RTO/RPO achieved.
Compliance evidence packs for all 3 frameworks in regulatory submission format
Year 1 re validation included in validation phase price
Phase 1 Timeline — 14 Weeks (full client cooperation)
Wk 1–2
Multi-Site Assessment
Current state review across all sites. Existing backup, replication, test history.
Multi site access provisioning is the most common source of early delay. Provision access before week 1 begins.
Wk 3 to 4
System Classification
All 100 systems classified. Tier assignment documented. BIA data required as input for all Tier 0 to 2 systems.
Systems missing BIA data classified provisionally which is subject to revision when BIA data is provided. Provisional classification flagged in all documents.
Wk 5 to 7
Dependency Mapping
Full mapping for all Tier 0 to 2 systems. Startup, auth, integration dependencies. Multi site dependency cascade analysis.
In multi site environments, cross site dependencies are systematically underdocumented. Allow 1 week buffer beyond estimate for investigation of inter site dependencies.
Wk 8 to 10
Architecture Design
Tier 0 synchronous topology. Cloud DR design. Ransomware resilience. Multi site isolation. Full implementation specification.
Tier 0 architecture for latency sensitive systems requires network latency measurements between candidate DR sites. Arrange this before week 8.
Wk 11 to 12
Cost Estimate & Runbooks
Itemized infrastructure cost estimate. Cost-benefit analysis. Runbooks for all Tier 0 and Tier 1 systems (up to 20).
Board approval of infrastructure investment is required before implementation begins. In organizations with monthly board cycles, allow 4 to 6 weeks between cost estimate delivery and implementation start.
Wk 13 to 14
Regulatory Docs & Handover
Compliance gap analysis for all 3 frameworks. DORA documentation. Implementation specification Q&A with IT team or partner.
DORA documentation review by legal compliance team typically adds 1 to 2 weeks. Build this into implementation planning.
What Your Team Provides (Phase 1)
IT lead and infrastructure team with 20 to 30 hours total across weeks 1 to 10 for assessment and dependency mapping
Site IT contacts at each site is available for 4 to 6 hours each during multi site assessment phase
Read only access to all infrastructure across all sites before week 1 including cloud consoles, backup platforms, network management
BIA data for all systems requiring Tier 0 to 2 classification. Systems without BIA data proceed with provisional classification which is flagged in all documents
Network latency measurements between primary and candidate DR sites, arranged before week 8
Legal and compliance with review regulatory documentation within 5 business days of issue
Board with cost estimate review and infrastructure investment approval before implementation begins
What Is Not in This Engagement
Any infrastructure implementation with procurement, configuration, deployment entirely separate and typically 5 to 15× the design fee at this scale.
OT/IT convergence DR for manufacturing or CNI systems which requires Enterprise scope
More than 100 systems or more than 4 sites which is scope ceiling; excess systems priced as additions at £850 per system
Runbooks for Tier 2 systems (Tier 0 and Tier 1 only, up to 20 total)
4th regulatory framework and beyond scope addition, priced and approved before execution
Phase 2 validation which is separately contracted after implementation is sufficiently complete to test
Year 2+ re-validation after Phase 2 from £16,000 per year
Enterprise Tier
Enterprise DR Architecture
For organizations above 100 systems, more than 4 sites, OT/IT convergence requirements, critical national infrastructure designation or DORA TLPT scope. Also appropriate for organizations integrating DR architecture into a post acquisition infrastructure programme. All enterprise engagements are individually scoped at the assessment session. The price below is the minimum which actual scope determines actual price.
From £120,000
Design phase · individually scoped · VAT excl.
Design: from 6 monthsValidation scoped and priced after implementation. Enterprise implementations typically take 6 to 18 months with validation follows.
Phase 1 — Design
Scope set at assessment session
From £120,000
From 6 months. Actual timeline agreed at scoping.
Full enterprise system inventory and classification have no ceiling on systems or sites
OT/IT convergence DR with industrial control systems, SCADA, PLCs using passive only assessment methods that do not disrupt running operations
Multi jurisdiction architecture with cloud sovereignty requirements, data residency by jurisdiction, cross border replication legal constraints
All applicable regulatory frameworks have no cap with DORA, FCA PS21/3, NIS2, NHS DSPT, NCSC CAF, ISO 22301, sector specific requirements
DORA TLPT programme design for significant entities (execution by separately engaged accredited TLPT provider)
Ransomware resilience at enterprise scale with multi tenant backup isolation, HSM credential separation, tiered immutability
Full runbook suite is for all Tier 0 to 2 systems across all sites
Post acquisition DR integration where the enterprise programme includes integrating an acquired organization’s DR architecture
Phase 2 — Validation
Priced after implementation
Individually scoped
Enterprise validation typically 3 to 6 months depends on number of systems and testing windows available.
Implementation review across all sites before testing begins
Automated restore verification programme across all backup jobs
Full recovery testing is for all Tier 0 and Tier 1 systems across all sites
OT system recovery validation using passive monitoring and isolated test environments only
Multi site simultaneous failure scenario testing
Full ransomware scenario exercise including detection, containment and recovery phases
DORA TLPT coordination (preparation and evidence management; execution by accredited provider)
Compliance evidence packs for all applicable frameworks, in submission ready format
2 year re validation programme included
Enterprise-Specific Requirements
Named C suite programme sponsor with authority to approve infrastructure investment at enterprise scale are typically a CTO, CIO or CISO
Dedicated internal programme coordinator with access across all sites and all teams
OT team involvement is for OT/IT convergence work, OT engineers must be available for assessment and must approve any assessment activities in their environment before they begin
Legal team availability for multi jurisdiction data residency and regulatory compliance review are typically the longest lead time resource in enterprise engagements
Infrastructure partner or internal team capable of implementing enterprise scale DR which must be identified and available before Phase 1 concludes
Why Enterprise DR Takes Longer Than Expected
OT system assessment using passive only methods is slow by design and active scanning in OT environments risks disrupting running operations
Multi jurisdiction data residency analysis requires legal input that typically moves on different timescales than technical work
Enterprise infrastructure procurement cycles can add 3 to 6 months between architecture approval and implementation readiness for validation
Recovery testing windows for Tier 0 production systems in regulated environments require planned maintenance windows that may be 6 to 8 weeks in advance
DORA TLPT engagement with an accredited provider is on the provider’s availability schedule, not ours but a plan of 3 to 6 months lead time for TLPT engagement

What both parties commit to and what happens when either fails.

These obligations are in the contract before work begins and the DR architecture engagement has specific dependencies that make bilateral obligations especially critical which the design depends on accurate information about the current state; the validation depends on implementation being complete and correct and the programme spans two separately contracted phases with a client controlled implementation gap between them which both phases require active participation from your team at defined points.

Client Obligations
Accurate disclosure of the current state
The architecture is designed from what you tell us and what we can observe with the access provided. If the current state is misrepresented which backup jobs are reported as working when they have not been tested, DR systems are reported as operational when they are partially decommissioned where the architecture will be wrong in ways we cannot correct if we do not know the actual state. Disclose the real state, including the uncomfortable parts. We will document it accurately and design from it honestly.
If the current state is later found to differ materially from what was disclosedStages built on incorrect information must be re executed. This is a scope change. Cost is assessed at the time and presented for approval before re execution.
Provide BIA data before architecture begins
System tier classification requires knowing the MTPD for each system and which is the maximum tolerable period of disruption derived from a Business Impact Analysis. Without this, tier assignments are estimates. An architecture built on estimated tier assignments may over-invest in some systems and under-protect others. If you do not have a BIA, we will assess whether a lightweight BIA can be conducted as part of this engagement or whether the BCP programme should precede the DR architecture.
If no BIA is available and you proceed anywayAll tier assignments without BIA basis are documented as provisional. Final tier assignments subject to revision when BIA data is available. The architecture carries explicit uncertainty notation on all provisional-tier systems.
Implement the architecture as specified
The implementation specification is a technical document your team or infrastructure partner follows. Deviations from the specification which is to reduce cost, to use preferred vendor alternatives, to skip steps that seem redundant which create divergences between what was designed and what was built. The validation phase will reveal these divergences. If they compromise recovery capability, they must be corrected. We are available for implementation Q&A during the implementation phase at our standard advisory rate which use this before making implementation decisions that deviate from the specification.
If implementation deviates from the specificationValidation phase begins with an implementation review. Deviations are documented and risk-classified. High risk deviations must be remediated before testing proceeds. Remediation cost is borne by the client.
Notify RJV before making production changes during the engagement
Infrastructure changes made during Phase 1 without notification can invalidate in progress architecture work. A new critical system added during dependency mapping may require revisiting the tier classification. A network topology change during architecture design may require revising the failover routing. Notify us of any significant production changes during the engagement. We will assess the impact and adjust the work accordingly with a brief notification saves significant rework.
If production changes occur without notificationWe are not accountable for architecture that does not reflect changes we were not told about. Rework required to incorporate unnotified changes is a scope addition.
RJV Obligations
Design to the systems we can observe and flag what we cannot
We design from what the investigation reveals. If access is restricted to certain systems or if specific information cannot be provided, we document the limitation and flag the architectural uncertainty it creates. We do not assume what we cannot verify. Every assumption in the architecture is labelled as an assumption with the condition that would make it invalid.
If an assumption is later found to be incorrectWe present a written impact assessment and the options for addressing it. If it requires rework, we classify the cause and apply costs accordingly.
Infrastructure cost estimates provided before implementation begins
We provide itemized infrastructure cost estimates as part of the design phase and before any implementation begins. These are estimates, not quotations where actual procurement costs depend on vendor pricing at the time of purchase, volume discounts, existing contract terms and market conditions. We will not underestimate costs to make the architecture appear more affordable. We will state where significant uncertainty exists in the estimate and why.
If implementation costs exceed the estimate materiallyIf actual costs exceed our estimate by more than 20% for reasons within our analytical control, we will discuss the discrepancy and its cause with you. We cannot control vendor pricing changes after the estimate is produced.
Test results reported as measured, not as targeted
The validation phase measures actual recovery performance under realistic test conditions. We report what the measurement shows and not what the target was. If a system achieves an RTO of 5 hours 40 minutes against a 4 hour target, the report says 5 hours 40 minutes and the system is not signed off until remediation closes the gap. We do not report a system as passing if it does not pass. There is no pressure on our side to report passing results with a passing result that conceals an actual failure is a liability for both parties.
If the client disputes a test result classificationAll test results are documented with methodology, timestamps and raw output. We can provide full test logs. If you believe a test was conducted incorrectly, raise it within 5 business days of receiving the test report and we will review the methodology together.
Phase 2 scoped honestly after Phase 1 and not before
We do not quote Phase 2 validation before Phase 1 is complete, because we cannot know precisely what we need to test until we have designed the architecture. Phase 2 is scoped and priced after Phase 1 delivery and after implementation is sufficiently complete to assess. The price ranges given on this page are indicative. The actual Phase 2 price is presented in writing after Phase 1, before any Phase 2 work begins. You are not committed to Phase 2 by engaging in Phase 1.
If the Phase 2 price is materially higher than the range on this pageWe explain why in writing, with the specific factors that drove the increase. You may decline Phase 2 which is your only obligation is the Phase 1 fee. If you decline, we provide guidance on what your team would need to do to conduct basic validation independently.

Questions to ask before signing anything

We already have a DR solution from our cloud provider / backup vendor. Why do we need an architecture engagement?
Cloud provider and backup vendor DR products provide infrastructure capability with storage, replication, compute. They do not provide the application dependency mapping, the tier classification based on your actual business impact data, the ransomware resilience architecture, the validated runbooks or the tested evidence that the infrastructure actually delivers your specific RTO targets. The product exists. Whether it is correctly configured, correctly scoped and actually able to recover your specific application stack in the time your BCP commits to and that is what this engagement determines most organizations discover gaps when they first validate properly.
What is the realistic total cost including implementation?
For a Foundation tier engagement (20 systems, Tier 1 for the top 5) which design fee £18,000 + validation £6,000 to £14,000 + annual infrastructure costs typically £15,000 to £80,000 depending on system size and cloud provider pricing. For an Advanced tier engagement (100 systems) which design £52,000 + validation £18,000 to £75,000 + annual infrastructure typically £80,000 to £900,000. Enterprise scale where infrastructure costs alone commonly reach £500,000 to £2M annually. These are the numbers you need before committing to an architecture programme. We provide itemized infrastructure cost estimates in Phase 1 so you have them before implementation begins.
How long between Phase 1 delivery and Phase 2 validation?
Entirely depends on how quickly implementation proceeds and which is outside our control. For Foundation tier with a capable IT team are typically 8 to 16 weeks. For Advanced tier involving cloud provider procurement, network connectivity work and configuration of 100 systems are typically 4 to 9 months. For Enterprise: 6 to 18 months is common. We cannot start Phase 2 until implementation is sufficiently complete to test. We assess implementation readiness before Phase 2 begins and report any gaps that would prevent meaningful testing.
Our existing IT team will implement the architecture. What if they make mistakes?
The implementation review at the start of Phase 2 is specifically designed to catch this. We compare the implemented infrastructure against the design specification before any testing begins. Discrepancies are documented and risk-classified. High risk discrepancies must be corrected before testing where if testing reveals a failure caused by an implementation error, the implementation must be corrected and the test re-run. The cost of correction is borne by the client. We are available for implementation advisory support during the implementation phase at our standard day rate which use this to reduce the probability of costly corrections later.
Can this DR architecture engagement be done before the BCP programme?
Yes but the architecture will carry more uncertainty. DR tier classification requires MTPD data per system and which comes from a Business Impact Analysis. Without BIA data, tier assignments are based on estimated impact which introduces the risk of over or under investment. If you have no BIA, we can conduct a lightweight impact assessment scoped to tier classification only as an add on to Phase 1 at an additional £6,000 to £18,000. This is a faster, narrower exercise than a full BCP programme BIA which is sufficient to classify systems, not sufficient to produce a full business continuity programme.
What if a recovery test fails and the root cause is disputed?
Test failures are classified as with RJV analytical error (our cost to remediate), implementation gap (client’s cost) or scope boundary item (discussed before proceeding). All test results are documented with methodology and raw output. If you dispute a classification, you raise it in writing within 5 business days of receiving the test report. If we cannot agree within 10 business days, an independent technical reviewer which is agreed by both parties and reviews the failed test. Their determination binds both parties. This process has never been invoked; we include it because the obligation is real regardless of how unlikely it is.
What is the relationship between this engagement and the BCP programme?
Complementary but independent. The BCP programme produces the organizational response: who does what, communicating with whom, following which procedures. The DR architecture produces the technical substrate those procedures depend on. Each can be commissioned independently. If you have a BCP with committed RTOs but no validated DR architecture to support them, the RTOs are aspirational targets rather than engineering commitments. Conversely, an excellent DR architecture without a BCP leaves the organizational response undefined which is technically capable of recovery but without a coordinated plan for executing it. The complete picture requires both.
What is your payment structure?
Phase 1: 50% on contract signature, 50% on delivery acceptance. Phase 2 is a separate contract: 50% on Phase 2 contract signature, 50% on validation completion and final report acceptance. Neither phase has milestone payments during execution. Scope additions within either phase are invoiced as they are agreed which is never retrospectively. Final payment for either phase is contingent on your written acceptance of the deliverables. If a deliverable does not meet the agreed specification, we remediate before raising the final invoice.

Start with a recovery assessment and bring your last test results, however old or the fact that there are none.

A 90 minute session in which we review your current DR infrastructure, your last test results if any exist and the RTOs your BCP or regulatory framework commits you to and where we assess the gap between the committed RTOs and the actual recovery capability the current infrastructure can deliver.

At the end of the session, you know whether your DR capability is adequate, marginal or fundamentally inadequate for your risk and regulatory position which most organizations find the assessment uncomfortable.

The gap between what the BCP says the DR capability delivers and what it actually delivers is, in most cases, larger than anyone has previously acknowledged explicitly and that acknowledgement, unpleasant as it is, is the necessary starting point for any genuine improvement.

Format
Video call or in person in London. 90 minutes.
Cost
Free. No commitment.
Lead time
Within 5 business days of contact.
Bring
Most recent DR test results or confirmation that none exist. Current BCP if you have one, specifically the RTO targets it commits to. Brief overview of your critical systems and your current backup infrastructure.
Attendees
Your IT lead or infrastructure manager. Optionally, whoever has accountability for regulatory compliance. From RJV senior technical consultant.
After
Written summary of findings within 2 business days. Written scope and fixed Phase 1 price within 5 business days if you want to proceed.