22 steps to reduce your AWS bill (Part 2 — Compute)

9 min readMar 12, 2021

A brief introduction to AWS cloud cost optimisation techniques.

This is Part 2 from series of “22 steps to reduce you AWS bill series”. Please also check articles Part 1, Part 3, Part 4 and Part 5.

22 cloud cost optimization areas

The AWS cloud cost optimization is a very broad topic with many different components which are cost drivers adding-up to your total bill. The optimization process is not limited to the list presented below but these are the most common and most obvious sources. Depending on your AWS architecture and services used you should also try to explore additional methods that would support your AWS cost management and potentially lead to lower bills.

Compute

Terminate “Zombie assets”
Consolidate idle resources
Monitor utilization patterns
Match EC2 instance with workload needs
Scale horizontally
Scale vertically
Upgrade to the latest generation
Match Amazon RDS DB engine with workload needs
Select DB engine
Scale horizontally and vertically
Cache DB storage in-memory
Configure instance scheduling

Terminate “Zombie assets”

It is estimated that 12% of an inventory is zombie assets. “Zombie assets” are unused but still active services contributing to your monthly bill. These can be:

development instances,
test instances,
training or demo instances,
temporary instances used for functionality checks,
or any other services provisioned by not utilized.

Anything you don’t use or are not planning to use in the future should be terminated. To fully optimize your cloud spend you must locate such instances as soon as possible.

Development and test instances should be shut down at the end of a workday and during weekends when development teams are not working. Training, demos, and temporary instances must be terminated once a project is completed. When you are in need to start new resources with the predefined configuration in a short time try to use AWS CloudFormation. By defining templates that describe resources and their dependencies you can launch and configure them together as a stack. Then you can update or delete an entire stack as a single unit. This is the most obvious example of the infrastructure as code process thoroughly discussed on the Internet. It will help you eliminate creation of zombie assets

Consolidate idle resources

The next step is to address idle resources. Back in the on-premises days having some low utilized resources to handle spikes in a workload was a good strategy. But today it is much more effective to consolidate jobs onto fewer instances or switch instances to auto-scaling instead of leaving them in a wait mode. Resources are there for you when you need them. Idle resources could be:

EC2 or VM instances,
Databases,
Load Balancers,
Containers.

An instance is considered idle when both conditions are met:

average CPU utilization has been lower than 2% for the last 7 days,
average network I/O has been lower than 5 MB for the last 7 days.

Remember that the cost of an idle instance is equal to the cost of a heavily utilized one with the same parameters. This is a tangible waste.

Monitor utilization patterns

There will be peaks and troughs in the utilization of your AWS resources but they may not be good indicators to make adjustments on spot. Monitoring spread over time will produce better results excluding short-term fluctuations and spotting emerging trends.

The most effective way to approach this task is to use heatmaps. AWS QuickSights is a native AWS BI tool that offers such functionality. Although QuickSights is a paid tool, possible savings justify spending 18$/24$ monthly/monthly-annual subscription. What you get in return are:

rich, customized dashboards with multiple different graphs on one page,
high granularity of reports (to the actual hour when an event took place),
grouping of data by any of the fields from the AWS Cost and Usage Report,
filtering by specific resource ID,
much more data dimensions.

Results of the analysis can be used to detect unusual behavior, predict utilization after running test load, or just to fine-tune your configuration after making cost optimization changes.

Match EC2 instance with workload needs

One of the most efficient methods to lower the AWS bill is to right size EC2 infrastructure. Right sizing is a process of matching instances to workload needs. It not only reduces the cost but also helps achieve peak performance from the resources. The instances should be optimized at least once during a 6 months period to match the real workload.

You start profiling each workload by evaluating the performance of your applications using e.g. load testing tools. Based on the results you will be able to:

scale horizontally — identify the best instance family,
scale vertically — pick appropriate instance size.

AWS offers a selection of different instances grouped in types each optimized for a different use case. Each instance type includes different instance sizes comprising varying combinations of CPU, memory, storage, and networking capacity. Various types of instance groups are associated with different costs.

Scale horizontally

Your workload’s primary function should determine what area you need to focus on. It may be more optimal to use one instance from a more expensive family optimized for specific functionality than two or three cheaper, general purpose instances. You need to consider the specific use case to determine the amount of memory, and the type of processing unit needed. You can select a family from:

General purpose,
Compute optimized,
Memory optimized,
Accelerated computing,
Storage optimized.

You will find the specifications of each family at Amazon EC2 Instance Types.

To give you a taste of the process check below a simplified comparison of instance types (US East Ohio, Dec 17, 2020). Let’s assume that you need 16GB of memory and EBS only storage. The right-hand column presents the total monthly cost of each instance. Carefully review those costs. See how the cost varies among families and instances.

Scale vertically

Instance types come in different sizes with slight variations in memory, virtual central processing units (vCPU), EBS burst bandwidths, and network performance but there is always one standard rule:

if you double the capacity (scale-up your instance by one) your cost will go up by 100%
if you downsize the capacity by one you reduce cost by 50%

By default, you should start lower and scale up to meet the demand of your application as it grows. You will find information about EC2 pricing at Amazon EC2 pricing.

Upgrade to the latest generation

Family refresh should be always considered when Amazon Web Services releases a new generation of instances. They introduce new features to support specific services and usually have better performance or improved functionality, and lower cost compared to their predecessors. Upgrading to the latest generation of instances saves you money. You can either upgrade existing instances to the latest generation or downsize existing instances to benefit from the same level of performance at a lower cost. With regards to AWS cost optimization best practices, you should look out for announcements related to the latest generation instances.

For example switching m5.large ($52.56 /month) to m6g.large ($42.12 /month) can save you $10.44 /month and switching db.r5.xlarge ($280.101 /month) to db.r6g.xlarge ($250.609 /month) can save you $29.492 / month. All with better performance of each new server generation.

Match Amazon RDS DB engine with workload needs

One of the key elements of the AWS cloud infrastructure are databases and they can also be one of the most expensive. As with EC2, you need to right size their instances. Again right sizing is a process of matching DB instances to workload needs. Scaling up or down as requirements change is a must and should be automated with policies otherwise you can end up under-provisioning or over-provisioning your resources. You will learn fast when under-provisioning happens however since over-provisioned instances don’t have a direct impact on performance you likely miss them. As a result, you may start paying for the computing power you don’t use.

Select DB engine

Changing the database engine is not a simple task but may have a big impact on cost especially when using Oracle or MS SQL Server. Decide whether you need to use Amazon RDS for MySQL / PostgreSQL / MariaDB / Oracle / SQL Server instances or you will be fine with Amazon Aurora Serverless MySQL and PostgreSQL compatible relational database. Amazon Aurora Serverless automatically scales up or down and shuts down during periods of inactivity. It also provides additional performance, reliability and durability features important for any enterprise-level applications. In the case of other Amazon RDS editions to optimize cost you need manually create policies for:

Read replicas
Unused instances
Primary instance

and take care of:

Backup
Storage

There is also an option to host a database on an EC2 instance. It gives a lot of flexibility in exchange for the overhead of deploying and maintaining the database the same way as done on-premises therefore in most cases is not a recommended solution.

When migrating it is good to make use of AWS Database Migration Service. It is free for six months if you switch to native AWS DB engines. The solution will handle data replication and ensure source and destination databases remain synchronized for as long as you choose. Additionally AWS Schema Conversion Tool will automatically convert the source database schema and a majority of the database code objects to a format compatible with the target database. SCT performs cloud-native code optimization by converting legacy Oracle and SQL Server functions to their equivalent AWS service.

Scale horizontally and vertically

If you decide to go with Amazon RDS then your workload’s primary function should determine the right database instance. You can select a family from:

Standard purpose,
Memory Optimized.

For each family, you can further scale horizontally by choosing from T or M instance types. Having two types makes the decision easier but remember that same as with EC2 this standard rule is still valid:

if you double the capacity (scale-up your instance by one) your cost will go up by 100%
if you downsize the capacity by one you reduce cost by 50%

Before you make a final decision, run monitoring of your database using CloudWatch Metrics for RDS. They can measure CPU and memory consumption, disk space consumption, database connections, I/O operations per second (IOPS), and throughput limits. Those metrics will be very useful to understand the utilization of the CPU, memory, and storage. And finally use the Trusted Advisor Amazon RDS Idle DB instances check, to identify DB instances that have not had any connection over the last 7 days and stop these DB instances.

Cache DB storage in-memory

For some workloads requiring continuous reading and writing to a database, you will see growing data transfer cost and database performance degradation. You can reduce a load of a database and data transfer cost by using AWS ElastiCache. Although adding a cache layer increases the complexity in the long run, cost saving, as well as service performance, should be worth the effort. AWS ElastiCache improves accessibility by moving frequently accessed data in-memory instead of retrieving them from a database.

Configure instance scheduling

“Auto parking” refers to scheduling AWS resources to shut down during off-hours. If you have EC2 instances that are not being used at fixed times it’s worth turning them on/off automatically. Otherwise, you generate costly empty runs. Even a small change in work schedule can bring measurable benefits e.g. let’s assume you turn off an m5.2xlarge instance ($0.471 per Hour) between 9 pm and 6 am. This is only 9 hours each day but it saves 37,5% from your bill.

“Infrastructure on-demand” is the term for resource capacity management offered by AWS Instance Scheduler. This tool enables you to configure custom start and stop schedules for EC2 and RDS instances.