What is a failover cluster?
SQL Server failover clusters are made of group of servers that run cluster enabled applications in a special way to minimize downtime. A failover is a process that happens if one node crashes, or becomes unavailable and the other one takes over and restarts the application automatically without human intervention.
What does SQL Server failover clustering provide?
A SQL Server failover cluster is also known as a High-availability cluster, as it provides redundancy for critical systems. The main concept behind failover clustering it to eliminate a single point of failure by including multiple network connections and shared data storage connected via SAN (Storage area network) or NAS (Network attached storage).
SQL Server failover cluster configurations
There are four main node configurations available in SQL Server failover clustering: Active/Active (Multi-Instance Failover Cluster), Active/Passive, N+1, and N+M
1. Active-Active Cluster or Multi-Instance Cluster
Active/Active means that both nodes are active and accessing the shared disk resources, but are running independent instances. When a node fails, you need to be sure that the remaining node has the resources available to handle the additional databases that fail over. You can think of it like this. Node A has 1 database on it, and Node B has 1 database on it. Node A goes down, the resources fail over to Node B, and now Node B has 2 databases running on it.
2. Active-Passive Cluster
In an Active/Passive cluster, you would only have 1 database running on a single node at any given time. Node A is active with 1 DB, Node B is passive with no DBs. Node A goes down, the resources fail over to Node B. Node B is now active with 1 database running on it.
Active/passive failover clusters have standby nodes that are activated only when the primary node is down. The primary node owns all the resources. In case of a failure, the standby node takes over all the resources and recovers the database from the database files and transaction logs.
3. N+1 Cluster
An N+1 failover cluster is based on active/passive nodes where two or more nodes share the same failover node. In the situation where all N nodes fail, the standby node must be capable to take over all load
4. N+M Cluster
An N+M failover cluster has two or more active nodes and two or more standby nodes. It is cheaper for implementation than the N+1 configuration, because the load can be distributed to more than one standby node.
Frequently Asked Questions for Clustering SQL Server
Q: Can I install every SQL Server component on my cluster?
A: Nope. SQL Server Integration Services is not “cluster-aware” and can’t fail back and forth with your cluster.
Q: How long does it take to fail over?
A: There are several factors to consider in failover time. There’s the time for the SQL Server Instance’s service to go down on one node, be initiated on another node, and start up. This time for instances to start and stop includes normal database recovery times. If you need to keep failovers within an SLA, you’ll want to test failover times in a planned downtime, but also estimate in how long failover might be if it happened at peak load.
Q: Can I cluster a virtualized server?
A: Yes, you can create failover clusters with virtual servers with VMware or Hyper-V, and install SQL Server into it. I think this is great for learning and testing, but I’m not crazy about this for production environments. Before you go too far down this path, look at leveraging your hypervisor’s native components for high availability.
Q: Why do you make such a big deal about the shared storage?
A: Because not everyone has robust shared storage available. You want to make sure you’re using shared storage that has redundancy in all the right places, because in a failover cluster shared storage is a single point of failure, no matter how magical the SAN seems. This also means that if your data is corrupted, it’s going to be corrupted no matter which node you access it from.
Q: What’s the minimum number of nodes in a failover cluster?
A: One. This is called a single-node cluster. This is useful for testing purposes and in case you have a two node cluster and need to do a work on a node. You can evict a node without destroying the cluster.
Q: Can I use clustering for Disaster Recovery?
A: Yes, but it requires some fancy setup. Most SQL Server clusters are installed in the same subnet in a single datacenter and are suitable for high availability. If you want to look into multi-site clustering, “geo-clustering” became available with SQL Server 2008, and is being enhanced in SQL Server 2012. Note: you’ll need storage magic like SAN replication to get your Geo-cluster on.
Q: Does it matter which version of Windows I use?
A: Yes, it matters a lot. Plan to install your Windows Failover Cluster on the most recent version of Windows Server, and you need Enterprise or Datacenter edition. If you must use an older version of Windows, make sure it’s at least Server 2008 with the latest service packs installed. The Failover Clustering Component of Windows was rewritten with Server 2008, so if you run on older versions you’ll have fewer features and you’ll be stuck chasing old problems.
Q: What is Quorum?
A: Quorum is a count of voting members— a quorum is a way of taking attendance of cluster members who are present. The cluster uses a quorum to determine who should be online.