some1 here
some1 here

Reputation: 1056

What is a difference between the Availability Zones and Fault domains feature of the Availability Sets in Microsoft Azure?

What is a difference between the Availability Zones and Fault domains feature of the Availability Sets in Microsoft Azure?

I am working through the module Discuss core Azure architectural components. And there I met the description of the Fault domains:

Fault domains. Fault domains provide for the physical separation of your workload across different hardware in the datacenter. This includes power, cooling, and network hardware that supports the physical servers located in server racks. In the event the hardware that supports a server rack becomes unavailable, only that rack of servers would be affected by the outage.

Also, there is an explanation of some of the Availability Zones features:

Each availability zone is an isolation boundary containing one or more datacenters equipped with independent power, cooling, and networking.

If one availability zone goes down, the other continues working.

The availability zones are typically connected to each other through very fast, private fiber-optic networks.

Availability zones allow customers to run mission-critical applications with high availability and low-latency replication.

Availability zones are offered as a service within Azure, and to ensure resiliency, there’s a minimum of three separate zones in all enabled regions.

I really do not see the difference between both. Both concepts are about physical separation of datacenter pieces within a single datacenter. So, could someone point out what is the main point in having these two terms? Don't they mean the same things?

May I consider the matter as follows? First we have a region, which is a facility which contains a few (at least three) Availability Zones (AZ), then each AZ contains one or more datacenters, then each datacenter contains a few Availability Sets, and here we have that each Availability Set contains a few racks, and each of these racks represents a separate Fault domain.

Upvotes: 4

Views: 4297

Answers (3)

yugandhar
yugandhar

Reputation: 630

In simple terms, The azure infrastructure is spread across multiple locations in the world, each location is a region. For example, US West 2 (Washington) is one region, US West 3 (Phoenix) is another region and Central India is another. Since regions are usually 1000's of miles apart, they can tolerate failures like large earthquakes, storms...

Regions are spread into multiple availability zones (often 3+), Each availability zone is group of datacentres (can be more than one too). Often availability zones are spread across the location (could be 50 miles apart), to tolerate failures like power source failure, floods, medium level earthquakes... The availability zones are connected with high bandwidth private network enter image description here

Each Datacenter usually made of multiple racks, each rack has it's own power source and network switch. So if a network switch or power source failure happens for a rack, it won't impact other racks. So it's better to keep the virtual machines in multiple racks (called fault domains or FD) to improve the performance)enter image description here

However, for a planned maintenance or updates that requires reboot, to improve availability the updates must be done in gradual manner and update only a part of rack at once. To improve the speed of update rollout, parts in multiple racks will be updated once. Such a group of rack parts are called updated domain (UD).

As an example below, we have 3 racks for 6 machines. However vm#1 and vm#4 are in same rack but in different update domain, so in case of planned maintenance it's guaranteed that only one of vm#1 and vm#4 can go down(Improving availability during planned maintenance). Conversely vm#1 and vm#6 are in different racks but in same update domain. So if planned maintenance is applied on update domain #1, both vm#1 and vm#6 will go down(Tradeoff for speed of rollout).
enter image description here

(Images are taken from official azure cloud documentation and azurecloud.expert)

Upvotes: 0

Sarye Haddadi
Sarye Haddadi

Reputation: 7456

When you create an Availability Set, it is fully enclosed within a single Availability Zone. Thus, an App spread across an Availability Set has a lower SLA (99.95%) than an App spread across Availability Zones (99.99%).

  • Each single Availability Zone is divided into 3 Fault Domains and 20 Update domaines. The confusion is you can imagine using the fault domains of different Zones, but no, an App spread across an Availability Set is within a single Zone.
  • An Azure Region supporting Availability Zones as at least 3 Datacenters, ie 3 Availability Zones. There are some Regions with less than 3 Zones, but Microsoft said it is committed to bring support to Ava. Zones to more regions.

And as per the definition you quoted:

Fault domains. Fault domains provide for the physical separation of your workload across different hardware in the datacenter (1 Datacenter = 1 Availability Zone).

Azure VM have 3 level of availability

  • Standalone VM — 99.9% availability (43m28s downtime per month)
  • Availability Sets — 99.95% availability (21m44s downtime per month)
  • Availability Zones — 99.99% availability (4m21s downtime per month)

See: VM availability in Azure - Crishantha Nanayakkara

Upvotes: 0

Emanuel V
Emanuel V

Reputation: 153

I think the Availability Zone IS the datacenter, and you have multiple zones within a region. The fault domain can be thought of as WITHIN the datacenter (going by the description you included). The domains are further segregated as described below.

Regions and Availability Zones in Azure

"An Availability Zone in an Azure region is a combination of a fault domain and an update domain. For example, if you create three or more VMs across three zones in an Azure region, your VMs are effectively distributed across three fault domains and three update domains. The Azure platform recognizes this distribution across update domains to make sure that VMs in different zones are not scheduled to be updated at the same time."

Upvotes: 5

Related Questions