AlwaysOn encompasses the SQL Server solutions for mission-critical high availability and disaster recovery. Two solutions were included under the AlwaysOn umbrella in SQL Server 2012: Availability Groups and Failover Cluster Instances. Each solution has different characteristics, making them appropriate for different scenarios, and both can be combined in the same deployment.
In SQL Server 2014, we are enhancing the availability, scalability, and flexibility of these solutions.
In this blog, we remind you of the AlwaysOn solutions and then describe the enhancements in SQL Server 2014 CTP1. In a future blog, we’ll describe the enhancements in CTP2.
Availability Groups
Availability Groups (AGs), introduced in SQL Server 2012, integrate and extend features from Database Mirroring and Log Shipping.
AGs provide high availability for a group of databases through redundant replicas hosted by up to 5 SQL Server instances (1 primary, 4 secondaries). Each SQL Server instance has its own copy of the databases (on its local disks), and AGs continuously synchronize transactions from the primary replica to the secondary replicas. Secondaries can be configured as synchronous or asynchronous allowing to trade data safety for performance. Similarly, they can be configured for automatic or manual failover to account for automated and manual processes. Secondary replicas maintain hot database copies (online state) so that failovers take only seconds.
AGs enable integrated high availability and disaster recovery configurations without the need for shared storage (e.g. SAN). In addition, secondary replicas can be used to offload read workloads and backups from the primary replica. A SQL Server instance can host replicas for multiple AGs, allowing the distribution of primary replicas across different SQL Server instances. AGs provide many other capabilities such as a Listener (virtual network name) for client connectivity, flexible failover policies, automatic page repair, full cross-feature support, compression, and encryption.
In SQL Server 2014 CTP1, AGs are enhanced with:
- Increased availability of readable secondaries in multi-site environments
- Increased number of secondaries
- Enhanced Diagnostics
- Integration with Windows Azure
Increased availability of readable secondaries in multi-site environments
Readable secondary replicas now allow read workloads to run without any disruption even in the presence of lasting network failures or loss of quorum on the underlying Windows Server Failover Cluster.
This is especially desirable in large-scale geo-distributed environments where network disconnections are not that unusual. This is depicted in the picture below, reports running on readable secondaries far from the main data center will continue running despite a network disconnection. Similarly, new reports started during the network disconnection will run.
Using the AlwaysOn Dashboard or DMVs, you can detect that a secondary replica is disconnected from the primary and determine the last time and transaction LSN that the secondary committed.
Increased number of secondaries
Many customers use AGs to satisfy their high availability, disaster recovery, and reporting requirements in a single solution (to configure and manage). The fact that AG log synchronization is many times faster than other technologies (e.g. Replication or Log Shipping) motivates use readable secondaries for reporting.
Thus, we have increased the maximum number of secondaries from 4 to 8.
This facilitates the following scenarios:
a) Distribution of replicas in a geo-distributed environment
Additional replicas can be deployed across the geo-distributed environment, allowing read workloads to run against a local replica.
b) Scaling-out of read workloads
Additional replicas can be used to load balance read workloads.
Load balancing can be implemented using simple DNS round-robin or specialized (hardware or software) load balancing solutions.
The picture below depicts the increased number of replicas in Object Explorer and the AlwaysOn Dashboard.
Enhanced Diagnostics
We have enhanced the diagnostics information for troubleshooting issues.
This involves:
- Simplifying error messages. For example, separating the generic error “replica can’t become primary” into multiple messages, each with a specific cause: “replica is not synchronized”, “windows cluster was started in forced quorum mode”, etc
- Making information easier to find and correlate. For example, adding names (AG, replica, and database) to all AlwaysOn XEvents or making additional columns more discoverable in the AlwaysOn Dashboard.
There are many more coming in CTP2. For example, allowing to view XEvents in UTC time, triggering XEvents when replicas change synchronization state, and recording the last time and transaction LSN committed when a replica goes to resolving state.
Integration with Windows Azure Infrastructure Services
This year we started supporting AGs on Windows Azure Infrastructure Services (SQL Server running on virtual machines). We support SQL Server 2012 and now SQL Server 2014.
This enables two scenarios:
a) High Availability for SQL Server databases hosted in Windows Azure
Configure a synchronous secondary replica for automatic failover in case of failure (SQL Server or VM), guess patching (SQL Server or OS), or Windows Azure’s host upgrade.
b) Disaster Recovery for on-premise SQL Server databases using Windows Azure
Configure one or more asynchronous secondary replicas for your on-premise databases. This allows manually failing over to an Azure replica in case of a failure impacting the on-premise environment. The Azure replicas can be used to offload read workloads and maintain additional backups.
The picture below depicts the canonical scenario. The AG has 2 replicas on-premise for high availability and a 3rd replica on Windows Azure for disaster recovery.
Implementing this scenario requires configuring a site-to-site VPN tunnel between your on-premise network and Windows Azure. You can use either a physical VPN device or Windows Server 2012 Routing and Remote Access Service (RRAS).
In CTP2, we’ll release a wizard to greatly simplify adding a replica on Azure.
Failover Cluster Instances
Failover Cluster Instances (FCIs) enhance the traditional SQL Server Failover Clustering feature. FCIs provide high availability for a SQL Server instance through redundant servers that share the same storage (generally a SAN) and appear as a single logical SQL Server instance. Only one active server has access to the storage and can run workloads. When this instance becomes unavailable, the storage ownership moves to a different server where SQL Server is started. SQL Server startup, which includes databases recovery, can take from seconds to a couple of minutes. The secondary servers are passive (can’t run any workloads).
Given that FCI ensures availability for a full SQL Server instance, it is suited for servers with many instance-level dependencies (e.g. jobs or SSIS packages) or for consolidation, hosting hundreds to thousands of databases.
In SQL Server 2012, FCIs were enhanced to support multi-subnet clusters, faster and more predictable failover times, flexible failover policies, and tempDB support on local disk.
In SQL Server 2014 CTP1, FCIs are enhanced with:
- Support for Cluster Shared Volumes
- Enhanced Diagnostics
Support for Cluster Shared Volumes
Cluster Shared Volumes (CSVs) is a feature of Windows Server Failover Cluster. A CSV is a shared disk that is made accessible to all nodes in a Windows Server Failover Cluster. CSVs build a common global namespace across the cluster (accessible under the %SystemDrive%\ClusterStorage root directory). This allows creating as many shared volumes as desired in the shared storage, removing the 24 drive letter limitation. For FCI this means that you can create as many FCIs as desired, without having to manage mount points.
CSVs also increase the resiliency of the cluster by having I/O fault detection and recovery over alternate communication paths between the nodes in the cluster. This is built on top of the Server Message Block (SMB) protocol. For FCI this means that, if the active server is not able to read/write directly to the shared storage, the CSV will automatically re-route the request through another node that has access. In addition, if the shared storage has any transient issues, CSV will transparently cache and re-try the operation.
Finally, CSVs increase the reliability of failovers, as disks don’t need to be unmounted and mounted as with traditional cluster disks. On failover, the new primary SQL Server instance can simply access the CSV.
CSVs are supported on Windows Server 2012 and Windows Server 2012 R2.
The picture below depicts two CSVs (Disk 1 and Disk 4) registered in Windows Server Failover Cluster. These CSVs can be selected as the shared disks during the installation of a FCI.
Enhanced Diagnostics
We made some errors easier to understand. In CTP2, we’ll display information about the underlying Windows Server Failover Cluster in AlwaysOn DMVs.
Try SQL14 CTP1 and give us feedback! Download it or try it on Windows Azure.
Let us know if you would like to join the Technology Adoption Program (TAP) and give us your feedback.