Understanding the Value of CloudOps
August 31, 2016
CloudOps, or cloud operations, is the formalization of best practices and procedures that allow cloud-based platforms, and applications and data that live there, to function well over a long duration of time. This is an important concept because the success of cloud computing won’t be defined around your ability to get workloads in the cloud; it’s the ability to solve real business problems over time.
While many cloud platform features aren't new, the rise of cloud brings many of its more advanced properties to public cloud-based platforms. Therefore, those charged with operations, or CloudOps, need to define the right operational procedures and practices around what clouds can do, rather than morph traditional approaches to operations for the cloud.
In other words, while moving to cloud is an important concept right now, the ability to stay there, and be effective, depends upon operational procedures and operational excellence.
CloudOps relies on continuous operations. This is the approach to operations that's emerging from best practices around DevOps. Continuous operations have the ability to run cloud-based systems in such a way that there's never the need to take part or all of an application out of service to help attain a zero downtime goal.
To achieve this objective, the software must be updated and placed into production without any interruption in service. Thus, continuous operations, as related to CloudOps, means installing mechanisms that allow zero downtime procedures to occur.
This meshes with the DevOps movement that’s underway right now. DevOps is the combining of development and operations. However, it’s also about the automation of everything. We automate development processes, testing, staging, integration, deployment, and, of course, operations.
So, continuous operations is about never stopping operations. Thus it’s about zero downtime, and the complete automation of most aspects of operations. That makes redundancy a best practice, both at the cloud provider layer that provides a certain amount of resiliency, and at the application layer as well, which you control, not the cloud provider.
Moreover, it’s the automation of fail over, backup, recovery, provisioning, etc., to ensure that the applications never stop providing service and the end users never experience outages. Processing doesn’t stop when updates to software are distributed.
More impressive, updates to hardware are made behind the scenes using virtual machine instances that can be moved, swapped, or stopped, also without interfering with the normal operations of the cloud-based applications.
Why CloudOps Now?
The real question is: How does cloud change things? Or, what does cloud computing bring to the table that makes operations different? Consider the following:
- The ability to scale out, or expand capacity at any time. Clouds provide the ability to self- or auto-provision servers. This feature adds a great deal of value, but can be a challenge to manage.
- The distributed and stateless nature of cloud-based platforms means that operations have to adjust to management that could span across the world.
- Infrastructure agnostic: Clouds can abstract the underlying infrastructure from the platforms and applications.
- Location transparent: We don’t really care where the physical servers exist, and we must manage them the same way.
- Latency tolerant: Latency can vary a great deal, and you’ll need to operate and manage clouds using the same attributes.
- Loosely coupled: Clouds run applications that share common services, and are not bound together.
- Leveraging data that is shared, replicated, and distributed means that data is not centrally located, and is either physically or logically separated.
- Automated: Much of the operations for clouds leverage a great deal of automation.
- Self-healing: Cloud uses automation as a way to fix common operational problems without effecting the applications or users.
- Dual active (or active/active) refers to how the cloud uses a network of independent processing nodes, where each node has access to a replicated database to give each node access and usage of a single application.
- Finally, metered cost, or usage-based accounting, bills the cloud account to the requesting resource or user for cloud usage.
While many of these platform features are not new, the rise of cloud brought many of the more advanced options to public cloud-based platforms. Those who are charged with operations, or CloudOps, need to define the right operational procures and practices around what clouds can do, rather than morph traditional approaches to operations for the cloud.
Redundancy seems to be core to all good cloud operations. Years ago, the use of redundant systems was costly, so most of those charged with operating systems used a single server. When the server was being updated with new patches and fixes, operations had to stop.
For many, the notion of downtime, or outages, is a common thing. Most enterprises experience several outages, both planned and unplanned, each year. A September 2013 study, conducted by the Ponemon Institute and sponsored by Emerson Network Power, reports that unplanned data center outages remained a significant threat to organizations, in terms of lost revenue. Unplanned outages were so feared that most of the survey respondents, 84 percent, said they “would rather walk barefoot over hot coals than have their data center go down.”
Moreover, 91 percent of survey respondents reported having experienced an unplanned data center outage in the past 24 months, with the frequency of outages reported at an average of two complete data center outages during the past two years. Partial outages, or those limited to certain racks, occurred six times in the same timeframe. The average number of device-level outages, or those limited to individual servers, was the highest at 11.
So, how do we improve these numbers with the use of cloud? It’s a matter of setting forth expectations of zero down time that have never been set. Mediocrity, in terms of operations, was and is the norm.
If the cloud is going to improve this problem, we need to take full operational advantage, and thus the need for new concepts such as CloudOps to provide that foundation. On the other hand, this is an emerging space. If we say that CloudOps is a well-known best practice and discipline, that would be less than truthful. However, we need to get better at CloudOps if cloud-based applications are to deliver the promised business value.
So what should you do? The first step is to actually embrace the concept that things will likely change around operations. The worst thing you can do is to continue to force the current on-premises operations model onto your cloud operations, and hope that it will work. It won’t.
This is about adapting your operations to cloud computing, but also changing your expectation, tools and talent in the process.