Note: These are some of my personal views on how to get to the cloud (economically). It’s not intended to be an indictment of the cloud. In fact, the cloud is amazing. But with great power comes great responsibility. AWS, Azure and GCP have transformed the way in which technology services are consumed. I’d like to help you avoid some of the expensive lessons I have learned. Hopefully, this article helps you put a better plan in place and you can reap the benefits of cloud computing.
If the cloud does cost too much, you’re probably doing it wrong.
Sorry if my editorial comments or the title of this article hurts your ego, but it’s really just to make a point. I commit to you, it’s not just click bait. In fact, if at the end of this article, you don’t think there is anything useful within, please contact me and we can discuss 1:1. I’d be happy to have a deeper conversation on the topic.
“The Cloud Is Cheaper.”
They, the ACME Company
If your cloud strategy consists of taking your on-premises workloads and simply moving them to AWS, Azure or Google Cloud Platform, you will most likely see your infrastructure bill go up. Lift & Shift is not a comprehensive cloud strategy. In order to benefit from the cloud economy, the applications you run in the cloud will need to be appropriately configured. Let’s start with a simple formula to help inform our discussion on cloud transformations.
The hypothesis that “the cloud is cheaper” is something many CIOs and CTOs accept as true before they embark on their cloud transformation. That hypothesis is loosely based on the formula above. Pitch decks and PowerPoint presentations have also helped to further gain their trust. There are some other variables in the formula we could consider (and we will), but before we complicate things, let’s talk about this simple equation.
Total Cost of Ownership vs. Transformations Costs
The distinction between these two variables (TCO & transformation costs) is really around the cost to run your infrastructure compared to the cost to build your infrastructure. Total cost of ownership costs are YoY (year over year) costs that occur more than once. Transformation costs occur just one time. Consider the following theoretical scenario from the ACME Company. Today, all of ACME’s infrastructure is on-premises in their own data centers. Last fiscal year, ACME spent $1M on technology infrastructure. We’ll refer to this as the status quo cost. ACME’s CIO wants to transform the company and go to the cloud. If the following formula is true, the journey to the cloud could be a good thing for the CIO’s career. It would mean the CIO would see a direct cost savings by transforming to the cloud.
What is Total Cost of Ownership?
TCO is made up of 2 kinds of general costs – direct costs and indirect costs. Direct costs are for things like hardware, software and people. Indirect costs are for things outage impacts and break-fix work; the costs that are incurred indirectly.
Do you know how to estimate the direct costs to run workload in the cloud?
All cloud services are not created equal when it comes to the total cost of ownership. The cost of human capital to support cloud services varies by the skill set required and the complexity of the service. The industry has rallied around 3 general groups of cloud services – SaaS (software as a service), PaaS (platform as a service) and IaaS (infrastructure as a service). Generally speaking, the human capital cost to run technology goes down as cloud service consumption goes up, with respect to this triangle.
Each of the major public cloud suppliers have solutions in each of these service areas. For example, GCP has Compute Engine, Azure has Virtual Machines and Amazon has EC2. With these you can run your company using IaaS. AWS has Redshift, Azure has Synapse and GCP has Big Query. These PaaS BigData platforms can power an enterprise data lake of any size. Some services are more complex than others, and that is the devil in the details.
Back to our theoretical example. If the direct cost to ACME to run its workload in the cloud is less than $1M per year, there would be a financial benefit through transforming.
“The Cloud is really just a bunch of servers owned by another company that are run within in their data center by their engineers. It’s not magic, they just do it pretty well.”
Jimmy Hurff
Please pardon the first person quote reference above. It’s an attempt at humor, not that I’m all that quotable. Key point: Managing the infrastructure in a corporate data center, and running an IaaS environment in the public cloud have similar direct cost burdens. To say it another way, you’ll need about the same amount of man power to support an IaaS environment as you do an enterprise infrastructure within a corporate data center.
For most companies, annual infrastructure spend is made up of both people costs and technology costs. In our fictitious ACME example, let’s assume the $1M annual spend is made up of $667K in people costs and $333K in technology costs. ACME has a technology team that cares and feeds for the infrastructure that runs its business. Thee team patches servers, updates firmware, roll upgrades, etc. Assuming status quo, next year’s spend would come in at $667K.
TCO Model | Status Quo | Cloud / IaaS |
People Cost | $667K | Same |
Tech Cost | $333K | ??? |
Let’s assume that running ACME’s workload in the cloud could be accomplished through a simple lift & shift migration to an IaaS model in one of the public clouds. In this scenario, the re-hosted assets would be managed in the same fashion as they are today…just in a different data center. Patches, firmware, upgrades. Same.
If ACME can secure the IaaS resources for $333K or less per year, a lift & shift would result in a lower cost technology platform after some period of return on investment to pay for the transformation costs. If not, some additional elements of modernization (reduced machine sizes, elasticity enablement, etc.) would be required in order for the transformation to the cloud to bring direct cost savings.
An Educated Guess: Lift & shift of on-premises virtual machines, core for core and byte for byte, to run within the public cloud could be shockingly more expensive. In my experience, this has often been the case. There are ways to overcome this, but generally it involves more effort in the area of transformation. For example, during the transformation effort, a technology team could replatform applications to take advantage of cloud elasticity, and subsequently use less computing resources at idle times. Good hygiene, like turning test environments off at night, become key levers to produce value.
What Makes The Cloud So Great Anyway?
This leads us to why the cloud is great. There are 2, or really just 1 (because the 2 really are the same thing…with just different perspectives), differentiating features of the cloud that make it great. Elasticity and Infinite horizontal scale. These are features you typically don’t have in your legacy data center. In order to leverage these differentiating features, you need to architect your cloud deployment properly.
A Strategy for Migration
Pivotal is credited with the concept of the 6R Strategy for migration. In general, the strategy is made up of 6 different methods for migrating technology environments. Which strategy you choose will have an impact on the other variable in the simple equation – the Transformation Cost.
In terms of modernizing, organizations can see greater benefit in the area of direct cost optimization through investment in transformation.
6R Strategy | Also Known As | Transformation Level of Effort | Direct Cost Optimization |
Retire | Shut It Down | – | $$$$$ |
Retain | Keep As Is | $ | – |
Rehost | Lift & Shift | $$ | $ |
Repurchase | Upgrade | $$$ | $$ |
Replatform | Lift, Tinker & Shift | $$$$ | $$$$ |
Refactor | Rewrite | $$$$$ | $$$$$ |
Shooting Ducks.
So what does this tell us? Well for one, retirement should be everyone’s objective (yes, that’s a double entendre). In addition to direct cost savings, turning systems off can also bring about indirect cost savings too (less systems to break, less systems to support, etc). Be vigilant here. Keep in mind, the 2nd Law of Thermodynamics is always in play. Not shutting down a system that is ripe for retirement only contributes to technical debt. Also, be aware that this is where you could also run into turf wars and politics within your technology organization. Individuals, teams and even entire organizations are sometimes tied to systems that should be retired.
Refactoring is Hard, But Worth It (Possibly).
Re-architecting and rewriting systems to take full advantage of cloud services can potentially produce great cost savings. However, these technical projects can be very difficult to successfully execute. Remember, ACME’s journey. Imagine a scenario where it costs $2M to modernize the ACME systems and convert them to take advantage of modern cloud features. In order to see a return on investment within your business stakeholder’s time frame, the YoY costs will need to fit their expectations.
The Gotchas
Estimates Are Always Better Than Actuals
This probably isn’t news to you if you’ve made it this far. One phenomenon that I’ve seen in multiple cloud migrations is in the simple graphic below. I call it the “Uh Oh” Moment. This is the time when you take a look at the cloud billing report and find a double whammy. Not only were your consultant’s estimates way off, but you also see that one of your well-intended, technically superior, early adopter engineering teams did something silly. They ran a quadruple extra large superduper spark TPU/GPU mega AI/ML cluster in the Northern Asia Pacific FEDRAMP region (I made that up) to test out how to crunch Pi over the long Memorial Day weekend.
During their unmonitored experiment, they racked up some hefty compute charges you hadn’t planned for. They threw the cloud at it. Uh oh. Later, you’ll put some up guardrails so they can’t hurt themselves or you again, but you’ll need to navigate the response right now very carefully. You also need to figure out how to update the budget forecasts with the CFO. There is a fine line between swinging back to IaaS and being killed by the cloud billing report. I’d recommend you plan for the “Uh Oh” Moment. They usually happen. My advice: Build a program with your well-intended, technical superior, early adopter engineering teams to be beta users … with full visibility and accountability to the billing report on day 1. Have them help you flesh out the cloud economics early in the migration journey.
Double The Infrastructure. Double The Charges.
There will probably definitely be a period of time during your migration where you’ll have both on-premises AND cloud infrastructure. Don’t forget about that in your business justification. For those of you who love formulas, it looks like this, and it can make for an awkward conversation with the CFO. My advice: make sure you account for some overlap.
Note: It may sound like “Why did we do this again? It seems like it’s twice as expensive.”
All Cloud Regions Are Not Equal
One of your engineers will come to you during the migration with some technical that sounds like “For the project, we need to be in the East US Region but all of the databases are in the East US Region #2.” This could be a big deal. Cloud suppliers have regional affinities just like you do in. These affinities can make huge differences when it comes to the cost of services and the cost to transmit data in between regions. My advice: make sure you have a good strategy around regional affinity for your cloud deployment.
Expert Help & Professional Services
This article isn’t really intended to be an advertisement, but if you’re not sure know where to start, hire an expert to guide you. There are many companies that provide expertise in this area. I own one of those companies – it’s called Cloud Tuner. There are many out there and they can help you avoid some expensive lessons.