Development and operations (DevOps) tools such as Puppet and Chef automate changes to configurations in systems. Some teams use these tools, and other frameworks, to actually automate the creation of the entire production Web server environment—sometimes in public services such as Amazon Web Services, sometimes in a local environment.
The problem with automating the rollouts is that code has bugs. A configuration change meant for QA—say, to direct users to a test environment—could be propagated to production, leaving users logged into an environment that looks real but will never actually ship products. (Don’t laugh too hard: Last month this happened to one of my clients, a multibillion-dollar retail operation moving to customer self-service.)
The good news is that, as new risks emerge, so, too, are new techniques to manage those risks. Here are a few things to think about in any cloud transition.
Amazon’s Elastic Compute Cloud has occasional, unpredictable outages. Even without Amazon, if a company uses Chef or Puppet to automate system administration, those tools use code, and that code could have defects.
Here are a few possible problems with a cloud implementation:
+ A feature is created in production but disabled by a configuration flag. A programmer turns on the GUI, but the behavior remains "off."
+ A private cloud manager designed to roll out new servers over time has a defect in the "reaper" process that turns off old instances.
+ Mistakes in the merge process can put test configurations such as databases, server names and URLs into production.
+ API issues, especially a third-party API that changes after the code "passes" the test environment.
All these problems could appear first in production. In fact, they’re likely to first appear in production, with no visible signs in the test environment. A week of phone calls, interviews and a trip to San Diego to discuss this in person at the Software Test Professionals Conference have led me to conclude that there are no easy answers.
A traditional test approach won’t find these problems. Instead, the people I interviewed recommended two things: Either change the architecture to reduce risk or monitor, test and (quickly) fix issues in production.
How to Change Software Testing for New Cloud Configurations