Series: RDS Rightsizing

RDS Rightsizing: A Practical Guide

AWS, RDS, FinOps, Architecture, Databases

Note: This document is a simplified, sanitized version of the actual domain blueprint used to train the RDS Rightsizing agent in our previous case study: Engineering AI Agents: Moving Beyond “Creative Statistics” to Build Pragmatic Dev Tools. The full, production-ready version lives securely in my client’s Confluence as part of their proprietary knowledge base and my personal consulting IP. Consider this a hands-on preview of how I translate engineering heuristics into structured, machine-ready logic.

This guide covers the basics of rightsizing classic RDS instances using standard CloudWatch metrics. We won’t go deep into Aurora (it has its own quirks, enough for a separate article), parameter group tuning, or exotic EC2 instance families. Think of this as a starting point — a solid foundation before you go hunting for more advanced optimizations.

Observations in this guide are based on analysis of over 700 RDS instances in production and staging environments.

Understanding Instance Naming

Before we dive into metrics, let’s decode instance names. Once you get the pattern, the AWS documentation becomes a lot less intimidating.

The format is: db.[family][generation][modifiers].[size]

The db. prefix

Just a label indicating this EC2 is managed by RDS. Nothing to worry about.

Family

The family tells you what the instance is optimized for:

t — Burstable general purpose. Cheaper than m but with caveats for sustained load above ~50% CPU (more on this later).
m — General purpose. 4 GiB RAM per vCPU. The boring, reliable workhorse.
r — Memory optimized. 8 GiB RAM per vCPU. Your go-to for databases that live in RAM.
c — Compute optimized. 2 GiB RAM per vCPU. Not available for RDS, but you’ll see it in EC2.

Generation

Just a number. Higher is better — newer generations consistently deliver better performance per dollar, though occasionally the per-vCPU price may be marginally higher on the latest gen. In practice, staying on newer generations is almost always worth it.

Modifiers

The modifier sits between the generation number and the size. The most important ones — both for this guide and in general — describe the CPU architecture:

(none) or i — Intel. The default, most widely supported, and most expensive option.
a — AMD. Roughly 10% cheaper than Intel with identical architecture. Generally a safe swap.
g — Graviton (ARM). Around 20% cheaper than Intel. In practice, this works perfectly well for RDS. I’ve personally heard exactly one story about ARM performance issues — at a DB architects meetup, from someone who was squeezing every last drop out of a banking application database. Worth noting: that was back when Graviton on RDS was relatively new (6th generation), and I honestly don’t know if it would still be a problem today. For the vast majority of databases it’s a complete non-issue, and the 20% savings are hard to argue with.

There are other modifiers beyond CPU architecture — for example, d indicates a local NVMe SSD attached directly to the instance, which bypasses EBS entirely (and its associated limits). It’s fast, but ephemeral: that storage survives a reboot but not an instance migration or type change. There’s no need to memorize any of this — the AWS documentation is your friend when you encounter something unfamiliar.

Size

The size determines vCPU count. Ignoring the t family (which has micro, small, and medium sizes with 2 vCPUs but progressively less RAM — fine for very low traffic or test environments), the rest follows a clean pattern:

Size	vCPUs	RAM (r family example)
large	2	16
xlarge	4	32
2xlarge	8	64
4xlarge	16	128
16xlarge	64	512

The multiplier in the name is relative to xlarge. So $2xlarge = 2 \times xlarge = 8 vCPUs$ . RAM follows the family ratio.

Example

db.r8gd.xlarge

Family: r — memory optimized
Generation: 8th
Modifiers:
- g — using Graviton CPU
- d — instance with local NVMe SSD drive
Size: xlarge — 4 vCPUs and 32 GiB of memory

MultiAZ vs Read Replicas: Not the Same Thing

This gets confused constantly, so let’s clear it up before touching any metrics.

MultiAZ is for availability, not performance

A MultiAZ setup creates an exact copy of your primary instance in a different Availability Zone. It doubles your cost and runs as a hot standby — invisible to your application under normal conditions. When the primary goes down (or during a maintenance window), RDS fails over to the standby automatically. That’s it. It does nothing for read throughput.

MultiAZ uses synchronous replication — every write must be confirmed on the standby before it’s acknowledged to the application. This means a failover won’t lose data: everything committed on the primary is guaranteed to be on the standby.

The flip side is that synchronous replication adds a small latency overhead to every write — the application has to wait for the cross-AZ round trip. In practice it’s a minor penalty and a reasonable price for proper DR, but it’s worth knowing it’s there.

Read replicas are for performance, not HA

A read replica is a separate database instance that receives data via asynchronous replication. This means:

There’s replication lag — the replica is always slightly behind the primary.
You can route read traffic to the replica to offload the primary.
The replica can be a different size than the primary.
It can live in the same AZ if that’s what makes sense.

Can you promote a replica to a new primary if the original dies? Technically yes. But it’s a manual process, you’ll need to reconfigure other replicas and update DNS, and there’s a risk of losing the most recent writes due to replication lag. It’s a last resort, not a HA strategy.

Practical implications

Non-production environments almost certainly don’t need MultiAZ. The cost is rarely justified outside of production.
A master with MultiAZ + 2 single-AZ read replicas behind an RDS Proxy can be a better trade-off than a master with MultiAZ + 1 MultiAZ replica — you get better read scalability at comparable or lower cost. That said, RDS Proxy configuration deserves its own article; there are nuances worth understanding before committing to that setup.
It’s also worth knowing that RDS MultiAZ Cluster exists as a separate deployment option — it combines the benefits of read replicas and MultiAZ in a single setup, and as a bonus it solves the separate-endpoint problem out of the box. It’s often a cheaper option than managing replicas and proxy separately, but it comes with its own architectural constraints. Another topic that deserves its own write-up.
More powerful routing solutions like pgpool-II can intelligently distribute read/write traffic across replicas, but they require dedicated infrastructure and have their own operational complexity. Separate article territory.

CPU Utilization

Use CPUUtilization from standard CloudWatch metrics. Observe over at least 1 month — 2 to 3 months gives a much clearer picture, especially for workloads with weekly or monthly cycles.

The baseline is never zero

A database instance always carries some baseline CPU load just to exist. The OS, the hypervisor, background processes — none of that is free. Depending on instance size, expect 1.5%–4% as a constant floor. If your CPU utilization never climbs much above that, the database is genuinely idle.

Replication has a cost

Replication isn’t magic — it consumes CPU on both the primary and the replica. Just maintaining the replication process adds roughly 1%–3% on top of the baseline, even with no actual writes to process. With active binary logs flowing, it’s more.

This matters when you’re evaluating whether to remove a read replica. Two replicas running at 20% each don’t simply merge into one at 40% CPU when you consolidate to one. The OS and replication overhead from the removed replica doesn’t carry over — it just disappears. In practice, two replicas at 20% CPU will consolidate to around 30%–35% on a single replica. Still well within a healthy range.

💡 For Aurora, none of this applies — replication is handled at the storage layer and is independent of compute instances.

When to consider downsizing

If CPUUtilization stays consistently below 50% over your observation window with minimal spikes, you have a candidate for downsizing (or replica removal). The key word is consistently — a database that sits at 30% for weeks but spikes to 80% every Friday afternoon is a different conversation.

Spikes require judgment. Reducing CPU capacity means that when those spikes repeat — and they usually do, because most load patterns are cyclical — things will still work, just slower. Whether that’s acceptable depends entirely on context:

A nightly data import that takes 30 minutes instead of 10? Probably fine.
The morning login rush that suddenly adds 20% to every authentication? Probably not fine.

Burstable instances (t family): special rules

The t family is a trap if you’re not paying attention. These instances deliver full advertised performance only up to around 50% CPU utilization — above that, you’re running on burst credits.

Burst credits accumulate over time and can be spent in short bursts. As a rough guideline, about 30 minutes of burst per day fits within the standard credit budget — and importantly, this holds approximately true regardless of instance size, because both earn rate and burn rate scale proportionally.

The danger zone:

Frequent or long spikes above 50%: you’ll exhaust credits and performance will throttle dramatically.
Unlimited burst mode: available as an option, but the cost is variable and can spiral quickly. Pricing for Linux instances is $0.05 per vCPU-hour for t2/t3 and $0.04 for t4g — charged for every hour spent in burst beyond the standard accumulated credits. As a rough rule of thumb, if an instance is spending around 40%–50% of its time in burst, you’re at the breakeven point where switching to a comparable m-family instance becomes cheaper. The detailed mechanism and surplus credit calculations are covered in the AWS documentation: AWS EC2 Burstable Performance Instances — Windows instances carry a higher rate, so factor that in if relevant.
💡 If your t-family instance is regularly spending significant time above 50% CPU, the honest answer is probably to move to an m or r family instance rather than trying to manage credits.

Memory

Database engines are greedy with RAM by design. Unused memory is wasted memory, so they’ll take as much as they can get for caching. This means a healthy database will always show low free memory — and that’s expected.

FreeableMemory: what you’re actually looking at

The FreeableMemory metric shows memory that’s in use but not critical — cache and buffers that can be reclaimed if needed. The question isn’t “is FreeableMemory low” but rather “is it lower than the baseline for this engine and instance size”.

PostgreSQL

Postgres on RDS (with default parameter group settings) reserves a significant portion of RAM on a fixed basis — shared buffers, index cache, and related structures. Based on observations across over 700 instances, the approximate baselines for default configurations are:

Instance RAM	Reserved (approx.)
2GB	~0.9GB
4GB	~1.7GB
8GB	~3.5GB
16GB	~7GB
32GB	~13GB

The pattern is roughly 45%–47% of total RAM locked in place. If FreeableMemory stays anchored at this baseline over a long period, the database isn’t really using the headroom — it’s a strong signal you could drop down in size.

💡 These numbers apply to default RDS parameter group settings. If someone has tuned shared_buffers manually, all bets are off — but anyone making those changes in the parameter group already knows what they’re doing.

MySQL

MySQL shows lower reserved baselines, but the variance between configurations is significant enough that we’re not going to publish specific numbers here. The approach remains the same: watch the trend, not the absolute value. If FreeableMemory is stable and high over weeks, the instance has too much RAM.

Sizing decisions from memory metrics

FreeableMemory stable at baseline over months → strong candidate for downsizing, potentially two size steps (e.g., 32GB → 8GB). Careful with that though – it’s safer to go down by one size, leave for some time and check again.
FreeableMemory slightly variable but consistently above baseline → consider downsizing one step (50% RAM reduction).
FreeableMemory fluctuating significantly → look more carefully before touching anything.

What basic metrics can’t tell you

Cache hit ratio — the real measure of whether your database is happy with its RAM allocation — requires Enhanced Monitoring or a Prometheus exporter. Standard CloudWatch won’t give you this. What it does give you is SwapUsage.

If SwapUsage is consistently zero or near zero, the instance is properly sized. The database never had to push memory to disk.

If SwapUsage is non-trivial and growing, the engine is actively swapping pages — which is painful for performance. This is your signal that more RAM would let the database breathe. It won’t help you downsize, but it’ll stop you from making a bad situation worse.

Disk Performance

Two metrics matter here: IOPS (operations per second) and throughput (MB/s). They measure different things:

IOPS — how many individual read/write operations the disk can handle per second.
Throughput — how much data moves per second.

For OLTP databases, IOPS is almost always the constraint. Out of ~700 instances analyzed, around 300 showed signs of IOPS pressure. Throughput was the bottleneck in approximately 2 cases. That ratio should calibrate your attention.

Throughput matters more for analytical workloads, data warehouses, or anything involving large sequential reads or writes — streaming, bulk imports, that sort of thing.

Storage types: `gp3` vs `io2`

For RDS, you’re choosing between two relevant EBS volume types:

gp3 (General Purpose SSD) — The right choice for the vast majority of workloads. Good performance, predictable cost, and IOPS/throughput can be configured independently of volume size.
io2 (Provisioned IOPS SSD) — For systems that need sub-millisecond consistency at p99 latency, multi-attach, or extreme IOPS requirements. If you need io2, you’ve probably already exhausted most other optimization options and are operating in fairly specialized territory. io2 Block Express exists for the truly exotic requirements.

`gp3` performance tiers

Volume size	Baseline IOPS	Baseline throughput	Max (configurable)
<400GB	3,000 IOPS	125 MiB/s	Fixed
≥400GB	12,000 IOPS	500 MiB/s	64,000 IOPS / 4,000 MiB/s

For volumes ≥400GB, IOPS and throughput can be scaled independently of each other and independently of volume size. This is useful — you don’t have to overprovision storage to get the performance you need.

This is the most significant improvement over the older gp2 volumes, where performance scaled rigidly with storage size: 3 IOPS per GB with a minimum of 100 IOPS and a maximum of 16,000 IOPS. On gp2, if you needed more IOPS you had to buy more disk — even if you didn’t need the space. gp3 decouples all of that. The baseline performance is also substantially higher for the same price, and the ceiling is 64,000 IOPS versus gp2’s 16,000 IOPS — though in practice, you’re unlikely to get anywhere near those limits on a typical RDS workload.

💡 Keep in mind that for volumes under 400GB, performance scaling is completely locked. If 3,000 IOPS is not enough, your only option is to bump the storage size. This isn’t just an arbitrary AWS pricing restriction; it stems from the underlying EBS infrastructure. Only when you hit the 400GB threshold does AWS start its “under-the-hood magic,” combining multiple physical SSD volumes into a single logical entity capable of scaling independently up to 64k IOPS.

The instance bottleneck trap

Here’s a mistake that shows up regularly: a team sees high IOPS, decides to provision more IOPS on the EBS volume, pays more, and sees no improvement. The reason is usually that the bottleneck isn’t the disk — it’s the network pipe between the EC2 instance and EBS.

Every EC2 instance has a maximum EBS bandwidth limit that’s independent of the volume’s capabilities. If you’re hitting the instance ceiling, upgrading the volume does nothing.

This is especially relevant for 2xlarge and smaller instances, where the limits are reached surprisingly quickly. Before scaling EBS configuration, check the instance limits:

Amazon EC2 Instance Types - EBS Optimized

Burst on smaller instances

Instances below a certain size have a baseline EBS bandwidth and a maximum (burst) bandwidth. The burst level is roughly equivalent to what a 4xlarge can sustain, but it’s available for only about 30 minutes per day via accumulated credits. Outside that window, you’re limited to the baseline.

This is separate from gp3 volume performance — the volume doesn’t burst. The instance network-to-EBS connection does.

💡 Newer instance generations typically bring improvements across the board — IOPS limits, network throughput, and EBS bandwidth all tend to increase. This means a generation upgrade can sometimes resolve a bottleneck without increasing instance size — and a generation upgrade is always cheaper than going up a size.

💡 For the 8th generation specifically: all sizes from large upwards have a baseline EBS bandwidth exceeding gp3’s 3,000 IOPS — but you need to reach 2xlarge before the instance can fully utilize a ≥400GB gp3 volume’s 12,000 IOPS baseline. Worth keeping in mind when matching instance size to storage configuration.

Using latency metrics

If TotalIOPS is well below the instance and volume limits but ReadLatency or WriteLatency is consistently high — that’s the scenario where io2’s sub-millisecond consistency starts to make sense. The problem isn’t volume, it’s predictability.

Conversely, high latency combined with high TotalIOPS points to a capacity problem: you’re saturating the instance pipe, the volume, or both. io2 won’t help there.

Honest note: this specific pattern (high latency, low IOPS) is rare in practice. We haven’t seen enough cases to publish a concrete latency threshold. Treat it as a diagnostic direction rather than a hard rule.

What to watch

Start with TotalIOPS — no need to split read/write initially.
If TotalIOPS is consistently high, check instance limits before touching EBS configuration.
Check ReadLatency / WriteLatency as a secondary signal, particularly if IOPS looks fine but queries are slow.
Throughput metrics are worth a glance for completeness, but don’t expect them to be the bottleneck.

Database Connections

This one is almost too simple to mention — and yet it’s surprisingly easy to overlook, especially with read replicas. Check the DatabaseConnections metric.

If an instance has registered zero connections — or suspiciously few — over a month-long observation window, it’s an obvious signal that the instance is unused. In most cases that means it’s unnecessary and can be removed. Occasionally it means there’s a misconfiguration somewhere in the services pointing at it, which is also worth knowing.

This matters especially for read replicas. Unlike Aurora (which provides a built-in reader endpoint), standard RDS gives each read replica its own address. Without an RDS Proxy or similar in front of them, traffic distribution is entirely up to whoever configured the application. If one replica is idle while others are struggling, that’s a routing problem worth investigating before drawing any conclusions about whether you need more or fewer replicas.

The most interesting — and surprisingly common — scenario is a master with one read replica where the replica receives zero connections while the master is working flat out. This almost always means the application has a single endpoint configured for both reads and writes. You can confirm this by checking ReadIOPS and ReadThroughput on the master: some read activity is expected (replication itself requires reads), but if read metrics are clearly and consistently higher than their write counterparts, the master is doing everything.

In that situation, there are two reasonable paths:

Remove the replica. It’s doing nothing except costing money.
Fix the root cause: route reads to the replica as originally intended, let things run for a while, then revisit the master’s metrics. There’s a reasonable chance the master, now handling only writes, becomes a candidate for downsizing too.

Network Throughput

Standard CloudWatch provides NetworkReceiveThroughput and NetworkTransmitThroughput. For most RDS setups, client-to-database network traffic is lighter than database-to-disk traffic, so network is rarely the bottleneck.

That said, it can become one — particularly in cases with high read replica traffic, large result sets, or data-intensive integrations. If you’re seeing unexplained performance issues and IOPS/CPU/memory all look healthy, it’s worth checking whether the instance’s network interface is saturated. Instance network limits are in the same documentation table as EBS limits.

Where to Go From Here

This guide covers the fundamentals: the metrics available in standard CloudWatch, what they mean, and how to use them to make reasonable sizing decisions without over-engineering the analysis.

What this guide deliberately skips:

Aurora-specific architecture and its different scaling model
Cache hit ratio analysis (requires Enhanced Monitoring or Prometheus)
Parameter group tuning and its impact on memory behavior
Advanced proxy and routing solutions (RDS Proxy deep-dive, pgpool-II)
Unlimited burst cost calculations for t-family instances (beyond the breakeven overview above)
Exotic EC2 instance modifiers and their niche use cases

Most of those topics deserve their own write-up. The metrics covered here are enough to identify the obvious wins — and in a fleet of any meaningful size, there are usually quite a few of those.