Data Infra

Hybrid AWS and On-Premises Data Disaster Recovery: A Comprehensive Solution

By Shemer Mashiach

1. Introduction:

Data plays a vital role in the modern business landscape. As such, it is crucial that its availability and integrity are ensured. Disruptions or disasters can significantly impact data accessibility, leading to severe financial and operational consequences for organizations. This paper explores the concept of data disaster recovery (DR) and presents how Playtika combines the benefits of Amazon Web Services (AWS) and on-premises infrastructure to create a robust hybrid DR solution. The hybrid approach leverages the strengths of both environments to maximize data protection, recovery speed, and cost efficiency.

2. Data Disaster Recovery:

Data disaster recovery involves the process of ensuring data availability and integrity in the event of a disruption or disaster. Various scenarios, including natural disasters, cyber-attacks, hardware failures, and human errors can all jeopardize data accessibility. A comprehensive DR plan is necessary to minimize the impact of such incidents and enable business continuity.

3. AWS Cloud Services:

Amazon Web Services (AWS) offers a wide range of cloud services that are highly relevant to data disaster recovery. At Playtika, we mainly use EC2 for its scalability, allowing you to scale your computing resources up or down, based on your needs. You can conveniently add or remove instances based on your workload needs, and enjoy flexibility, thanks to a diverse selection of instance types. You can also choose the optimal configuration for your specific workload, whether it requires compute power or memory optimization.

4. On-premises Infrastructure:

Playtika operates 95% of its infrastructure within on-premises data centers, which plays a vital role in ensuring data disaster recovery. The solution incorporates local backups and redundant configurations, with storage appliances such as Pure and Vast serving as the backbone. These appliances are continuously synchronized between our data centers, while network-attached storage (NAS) and redundant array of independent disks (RAID) are commonly utilized technologies for safeguarding on-premises data.

5. Hybrid Data DR Architecture:

The hybrid DR architecture combines the strengths of AWS cloud services and on-premises infrastructure to create a robust and scalable solution. To achieve dual redundancy, Playtika synchronizes data between its two on-premises data centers. This synchronization of data copies in both environments enables Playtika to ensure high availability and swift recovery in case of any disruptions.

6. Failover and Failback Strategies:

Failover refers to the transition from the primary infrastructure to a backup environment, while failback involves returning to the primary infrastructure once the disaster has been resolved. Automated failover procedures, such as DNS routing, load balancing, and IP address reconfiguration, play a crucial role in minimizing downtime. It is important to note that all data persistence on our disaster recovery (DR) site, ensuring data availability during the failover process. 

At Playtika, this entire process is fully automated and overseen by our OPS group.

This process encompasses several steps:

  1. Configuring the network in AWS.
  2. Scaling up the network configuration.

Provisioning servers using Terraform.Installing necessary applications and establishing connections to our disaster recovery (DR) siteModifying DNS/F5 records to ensure smooth transition..In certain situations, manual failover may be necessary.

7. Cost Considerations:

Cost is a crucial factor in designing a data DR solution. AWS offers flexible pricing models, including pay-as-you-go and reserved instances, allowing organizations to optimize costs based on their specific needs. Data lifecycle management techniques, such as tiering and archiving, can also be employed to help reduce storage costs.

By adopting this hybrid solution, Playtika is able to avoid having to purchase all the necessary hardware upfront. Instead, hardware resources are provisioned on-demand as and when they are required.

8. Testing and Maintenance:

Regular testing and maintenance are vital to ensure the effectiveness of the DR solution. It is vital to conduct table-top exercises, parallel testing, and full-scale simulations to validate the readiness of the hybrid DR environment. Additionally, continuous monitoring and updates are necessary to promptly identify and address any vulnerabilities or configuration changes that could potentially impact data DR operations. These proactive measures help maintain the resilience and reliability of the DR solution, ensuring its ability to effectively mitigate risks and facilitate seamless recovery when needed.

9. Conclusion:

A comprehensive and robust data disaster recovery solution is achieved through a hybrid approach that combines AWS cloud services and on-premises infrastructure. This approach leverages the strengths of both environments, enabling organizations to ensure high data availability, rapid recovery, and cost efficiency. To adapt to evolving business needs and technological advancements, it is crucial to proactively plan, test, and regularly maintain the hybrid disaster recovery (DR) environment. In doing so, organizations can stay prepared for potential disruptions and confidently navigate the ever-changing landscape of data protection and recovery.