MetroCluster is a solution for building fault-tolerant and highly available storage systems with synchronous mirroring between two independent data centers. The technology is used to protect enterprise data and ensure business continuity in the event of outages or disasters at one site.
MetroCluster was developed by NetApp and is now considered a standard for mission-critical IT infrastructures where downtime is unacceptable (banking, telecom, government systems, large-scale retail, healthcare organizations).
How It Works
MetroCluster combines two remote data centers into a unified clustered system:
- Synchronous replication – all data is written simultaneously to both storage systems, ensuring no data loss in case of failure.
- Automatic failover – if one site goes down, the other continues operations without administrator intervention.
- Failback – after the primary site is restored, data is resynchronized back.
The distance between sites is typically limited to about 300 km due to network latency and synchronous replication requirements.
MetroCluster Architecture
- Two storage clusters – based on NetApp ONTAP systems.
- Network connectivity – high-speed links (typically optical) for replication.
- Software layer – ensures synchronization, automatic failover, and configuration management.
MetroCluster can be deployed in two variants:
- Stretch MetroCluster – when both clusters are located within the same city or region.
- Fabric-Attached MetroCluster – when sites are connected through a storage area network (SAN) for longer distances.
Applications
MetroCluster is used in environments where zero data loss (ZDL) and minimal downtime are critical:
- banking and payment systems;
- telecommunications companies;
- healthcare institutions and laboratories;
- government and military information systems;
- large enterprises with distributed IT infrastructures.
Example
A major bank deployed MetroCluster across two data centers located in different districts of a city. All transactions are synchronously recorded in both storage systems. In case one site fails, the system automatically shifts operations to the second, ensuring uninterrupted online banking services. Once the infrastructure is restored, data is resynchronized back.