**Note:** *This document is a simplified, sanitized version of the actual domain blueprint used to train the 
RDS Rightsizing agent in our previous case study: [Engineering AI Agents: Moving Beyond "Creative Statistics" 
to Build Pragmatic Dev Tools](/insights/engineering-ai-agents-moving-beyond-creative-statistics-to-build-pragmatic-dev-tools). The full, production-ready version lives securely in my client's Confluence as part of their proprietary knowledge base and my personal consulting IP. Consider this a hands-on preview of how I translate engineering heuristics into structured, machine-ready logic.*

This guide covers the basics of rightsizing classic RDS instances using standard CloudWatch metrics. We won't go 
deep into Aurora (it has its own quirks, enough for a separate article), parameter group tuning, or exotic EC2 
instance families. Think of this as a starting point — a solid foundation before you go hunting for more advanced 
optimizations.

Observations in this guide are based on analysis of over _700_ RDS instances in production and staging environments.

## Understanding Instance Naming

Before we dive into metrics, let's decode instance names. Once you get the pattern, the AWS documentation becomes 
a lot less intimidating.

The format is: `db.[family][generation][modifiers].[size]`

#### The db. prefix

Just a label indicating this EC2 is managed by RDS. Nothing to worry about.

#### Family

The family tells you what the instance is optimized for:

* `t` — Burstable general purpose. Cheaper than `m` but with caveats for sustained load above ~50% CPU (more on this later).  
* `m` — General purpose. 4 GiB RAM per vCPU. The boring, reliable workhorse.  
* `r` — Memory optimized. 8 GiB RAM per vCPU. Your go-to for databases that live in RAM.  
* `c` — Compute optimized. 2 GiB RAM per vCPU. Not available for RDS, but you'll see it in EC2.

#### Generation

Just a number. Higher is better — newer generations consistently deliver better performance per dollar, though 
occasionally the per-vCPU price may be marginally higher on the latest gen. In practice, staying on newer generations 
is almost always worth it.

#### Modifiers

The modifier sits between the generation number and the size. The most important ones — both for this guide and in 
general — describe the CPU architecture:

* (none) or `i` — Intel. The default, most widely supported, and most expensive option.  
* `a` — AMD. Roughly 10% cheaper than Intel with identical architecture. Generally a safe swap.  
* `g` — Graviton (ARM). Around 20% cheaper than Intel. In practice, this works perfectly well for RDS. I’ve personally 
  heard exactly one story about ARM performance issues — at a DB architects meetup, from someone who was squeezing 
  every last drop out of a banking application database. Worth noting: that was back when Graviton on RDS was relatively 
  new (6th generation), and I honestly don’t know if it would still be a problem today. For the vast majority of databases 
  it’s a complete non-issue, and the 20% savings are hard to argue with.

There are other modifiers beyond CPU architecture — for example, `d` indicates a local NVMe SSD attached directly to 
the instance, which bypasses EBS entirely (and its associated limits). It’s fast, but ephemeral: that storage survives a 
reboot but not an instance migration or type change. There’s no need to memorize any of this — the AWS documentation 
is your friend when you encounter something unfamiliar.

#### Size

The size determines vCPU count. Ignoring the t family (which has micro, small, and medium sizes with 2 vCPUs 
but progressively less RAM — fine for very low traffic or test environments), the rest follows a clean pattern:

| Size | vCPUs | RAM (r family example) |
| :---- | :---- | :---- |
| large | 2 | 16 |
| xlarge | 4 | 32 |
| 2xlarge | 8 | 64 |
| 4xlarge | 16 | 128 |
| 16xlarge | 64 | 512 |

The multiplier in the name is relative to xlarge. So $2xlarge = 2 \times xlarge = 8 vCPUs$. RAM follows the family ratio.

#### Example

`db.r8gd.xlarge`

* Family: `r` — memory optimized  
* Generation: 8th  
* Modifiers:  
  * `g` — using Graviton CPU  
  * `d` — instance with local NVMe SSD drive  
* Size: `xlarge` — 4 vCPUs and 32 GiB of memory

## MultiAZ vs Read Replicas: Not the Same Thing

This gets confused constantly, so let's clear it up before touching any metrics.

### MultiAZ is for availability, not performance

A MultiAZ setup creates an exact copy of your primary instance in a different Availability Zone. It doubles your cost 
and runs as a hot standby — invisible to your application under normal conditions. When the primary goes down (or during 
a maintenance window), RDS fails over to the standby automatically. That's it. It does nothing for read throughput.

MultiAZ uses **synchronous** replication — every write must be confirmed on the standby before it’s acknowledged to the 
application. This means a failover won’t lose data: everything committed on the primary is guaranteed to be on the standby.

The flip side is that synchronous replication adds a small latency overhead to every write — the application has to wait 
for the cross-AZ round trip. In practice it’s a minor penalty and a reasonable price for proper DR, but it’s worth knowing 
it’s there.

### Read replicas are for performance, not HA

A read replica is a separate database instance that receives data via **asynchronous** replication. This means:

* There's replication lag — the replica is always slightly behind the primary.  
* You can route read traffic to the replica to offload the primary.  
* The replica can be a different size than the primary.  
* It can live in the same AZ if that's what makes sense.

Can you promote a replica to a new primary if the original dies? Technically yes. But it's a manual process, you'll 
need to reconfigure other replicas and update DNS, and there's a risk of losing the most recent writes due to 
replication lag. It's a last resort, not a HA strategy.

### Practical implications

* Non-production environments almost certainly don't need MultiAZ. The cost is rarely justified outside of production.  
* A master with MultiAZ + 2 single-AZ read replicas behind an RDS Proxy can be a better trade-off 
  than a master with MultiAZ + 1 MultiAZ replica — you get better read scalability at comparable 
  or lower cost. That said, RDS Proxy configuration deserves its own article; there are nuances worth understanding 
  before committing to that setup.  
* It’s also worth knowing that `RDS MultiAZ Cluster` exists as a separate deployment option — it combines the benefits of 
  read replicas and MultiAZ in a single setup, and as a bonus it solves the separate-endpoint problem out of the box. 
  It’s often a cheaper option than managing replicas and proxy separately, but it comes with its own architectural 
  constraints. Another topic that deserves its own write-up.  
* More powerful routing solutions like pgpool-II can intelligently distribute read/write traffic across replicas, but 
  they require dedicated infrastructure and have their own operational complexity. Separate article territory.

## CPU Utilization

Use `CPUUtilization` from standard CloudWatch metrics. Observe over at least 1 month — 2 to 3 months gives a much clearer 
picture, especially for workloads with weekly or monthly cycles.

### The baseline is never zero

A database instance always carries some baseline CPU load just to exist. The OS, the hypervisor, background processes 
— none of that is free. Depending on instance size, expect 1.5%–4% as a constant floor. If your CPU utilization never 
climbs much above that, the database is genuinely idle.

### Replication has a cost

Replication isn't magic — it consumes CPU on both the primary and the replica. Just maintaining the replication process 
adds roughly 1%–3% on top of the baseline, even with no actual writes to process. With active binary logs flowing, 
it's more.

This matters when you're evaluating whether to remove a read replica. Two replicas running at 20% each don't simply
merge into one at 40% CPU when you consolidate to one. The OS and replication overhead from the removed replica doesn't 
carry over — it just disappears. In practice, two replicas at 20% CPU will consolidate to around 30%–35% on a single 
replica. Still well within a healthy range.

💡 *For Aurora, none of this applies — replication is handled at the storage layer and is independent of compute instances.*

### When to consider downsizing

If `CPUUtilization` stays consistently below 50% over your observation window with minimal spikes, you have a candidate 
for downsizing (or replica removal). The key word is *consistently* — a database that sits at 30% for weeks but spikes to 
80% every Friday afternoon is a different conversation.

Spikes require judgment. Reducing CPU capacity means that when those spikes repeat — and they usually do, because most 
load patterns are cyclical — things will still work, just slower. Whether that's acceptable depends entirely on context:

* A nightly data import that takes 30 minutes instead of 10? Probably fine.  
* The morning login rush that suddenly adds 20% to every authentication? Probably not fine.

### Burstable instances (t family): special rules

The t family is a trap if you're not paying attention. These instances deliver full advertised performance only up 
to around 50% CPU utilization — above that, you're running on burst credits.

Burst credits accumulate over time and can be spent in short bursts. As a rough guideline, about 30 minutes of burst 
per day fits within the standard credit budget — and importantly, this holds approximately true regardless of instance 
size, because both earn rate and burn rate scale proportionally.

The danger zone:

* Frequent or long spikes above 50%: you'll exhaust credits and performance will throttle dramatically.  
* Unlimited burst mode: available as an option, but the cost is variable and can spiral quickly. Pricing for Linux 
  instances is \$0.05 per vCPU-hour for t2/t3 and \$0.04 for t4g — charged for every hour spent in burst beyond the 
  standard accumulated credits. As a rough rule of thumb, if an instance is spending around 40%–50% of its time in burst, 
  you’re at the breakeven point where switching to a comparable m-family instance becomes cheaper. The detailed mechanism 
  and surplus credit calculations are covered in the AWS documentation: [AWS EC2 Burstable Performance Instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-performance-instances-unlimited-mode-concepts.html#unlimited-mode-surplus-credits) 
  — Windows instances carry a higher rate, so factor that in if relevant.  

  💡 *If your t-family instance is regularly spending significant time above 50% CPU, the honest answer is probably to 
  move to an m or r family instance rather than trying to manage credits.*

## Memory

Database engines are greedy with RAM by design. Unused memory is wasted memory, so they'll take as much as they can get 
for caching. This means a healthy database will always show low free memory — and that's expected.

### FreeableMemory: what you're actually looking at

The `FreeableMemory` metric shows memory that's in use but not critical — cache and buffers that can be reclaimed if needed. 
The question isn't _"is FreeableMemory low"_ but rather _"is it lower than the baseline for this engine and instance size"_.

#### PostgreSQL

Postgres on RDS (with default parameter group settings) reserves a significant portion of RAM on a fixed basis — 
shared buffers, index cache, and related structures. Based on observations across over 700 instances, the 
approximate baselines for default configurations are:

| Instance RAM | Reserved (approx.) |
| :---- | :---- |
| 2GB | ~0.9GB |
| 4GB | ~1.7GB |
| 8GB | ~3.5GB |
| 16GB | ~7GB |
| 32GB | ~13GB |

The pattern is roughly 45%–47% of total RAM locked in place. If `FreeableMemory` stays anchored at this baseline over 
a long period, the database isn't really using the headroom — it's a strong signal you could drop down in size.

💡 *These numbers apply to default RDS parameter group settings. If someone has tuned `shared_buffers` manually, 
all bets are off — but anyone making those changes in the parameter group already knows what they're doing.*

#### MySQL

MySQL shows lower reserved baselines, but the variance between configurations is significant enough that we're not 
going to publish specific numbers here. The approach remains the same: watch the trend, not the absolute value. 
If `FreeableMemory` is stable and high over weeks, the instance has too much RAM.

#### Sizing decisions from memory metrics

* `FreeableMemory` stable at baseline over months → strong candidate for downsizing, potentially two size steps (e.g., 
  32GB → 8GB). Careful with that though – it’s safer to go down by one size, leave for some time and check again.  
* `FreeableMemory` slightly variable but consistently above baseline → consider downsizing one step (50% RAM reduction).  
* `FreeableMemory` fluctuating significantly → look more carefully before touching anything.

#### What basic metrics can't tell you

`Cache hit ratio` — the real measure of whether your database is happy with its RAM allocation — requires Enhanced 
Monitoring or a Prometheus exporter. Standard CloudWatch won't give you this. What it does give you is `SwapUsage`.

If `SwapUsage` is consistently zero or near zero, the instance is properly sized. The database never had to push 
memory to disk.

If `SwapUsage` is non-trivial and growing, the engine is actively swapping pages — which is painful for performance. 
This is your signal that more RAM would let the database breathe. It won't help you downsize, but it'll stop you 
from making a bad situation worse.

## Disk Performance

Two metrics matter here: IOPS (operations per second) and throughput (MB/s). They measure different things:

* IOPS — how many individual read/write operations the disk can handle per second.  
* Throughput — how much data moves per second.

For OLTP databases, IOPS is almost always the constraint. Out of ~700 instances analyzed, around 
300 showed signs of IOPS pressure. Throughput was the bottleneck in approximately 2 cases. 
That ratio should calibrate your attention.

Throughput matters more for analytical workloads, data warehouses, or anything involving large sequential reads 
or writes — streaming, bulk imports, that sort of thing.

#### Storage types: `gp3` vs `io2`

For RDS, you're choosing between two relevant EBS volume types:

* `gp3` (General Purpose SSD) — The right choice for the vast majority of workloads. Good performance, predictable cost, 
  and IOPS/throughput can be configured independently of volume size.  
* `io2` (Provisioned IOPS SSD) — For systems that need sub-millisecond consistency at _p99_ latency, multi-attach, 
  or extreme IOPS requirements. If you need io2, you've probably already exhausted most other optimization options and 
  are operating in fairly specialized territory. io2 Block Express exists for the truly exotic requirements.

#### `gp3` performance tiers

| Volume size | Baseline IOPS | Baseline throughput | Max (configurable) |
| :---- | :---- | :---- | :---- |
| \<400GB | 3,000 IOPS | 125 MiB/s | Fixed |
| ≥400GB | 12,000 IOPS | 500 MiB/s | 64,000 IOPS / 4,000 MiB/s |

For volumes ≥400GB, IOPS and throughput can be scaled independently of each other and independently of volume size. 
This is useful — you don't have to overprovision storage to get the performance you need.

This is the most significant improvement over the older `gp2` volumes, where performance scaled rigidly with storage 
size: 3 IOPS per GB with a minimum of 100 IOPS and a maximum of 16,000 IOPS. On `gp2`, if you needed more IOPS you 
had to buy more disk — even if you didn’t need the space. `gp3` decouples all of that. The baseline performance is also 
substantially higher for the same price, and the ceiling is 64,000 IOPS versus `gp2`’s 16,000 IOPS — though in practice, 
you’re unlikely to get anywhere near those limits on a typical RDS workload.

💡 *Keep in mind that for volumes under 400GB, performance scaling is completely locked. If 3,000 IOPS is not enough, 
your only option is to bump the storage size. This isn't just an arbitrary AWS pricing restriction; it stems from 
the underlying EBS infrastructure. Only when you hit the 400GB threshold does AWS start its "under-the-hood magic," 
combining multiple physical SSD volumes into a single logical entity capable of scaling independently up to 64k IOPS.*

## The instance bottleneck trap

Here's a mistake that shows up regularly: a team sees high IOPS, decides to provision more IOPS on the EBS volume, 
pays more, and sees no improvement. The reason is usually that the bottleneck isn't the disk — it's the network pipe 
between the EC2 instance and EBS.

Every EC2 instance has a maximum EBS bandwidth limit that's independent of the volume's capabilities. If you're hitting 
the instance ceiling, upgrading the volume does nothing.

This is especially relevant for 2xlarge and smaller instances, where the limits are reached surprisingly quickly. Before 
scaling EBS configuration, check the instance limits:

[Amazon EC2 Instance Types \- EBS Optimized](https://docs.aws.amazon.com/ec2/latest/instancetypes/mo.html)

### Burst on smaller instances

Instances below a certain size have a baseline EBS bandwidth and a maximum (burst) bandwidth. The burst level is roughly 
equivalent to what a 4xlarge can sustain, but it's available for only about 30 minutes per day via accumulated 
credits. Outside that window, you're limited to the baseline.

This is separate from `gp3` volume performance — the volume doesn't burst. The instance network-to-EBS connection does.

💡 *Newer instance generations typically bring improvements across the board — IOPS limits, network throughput, and 
EBS bandwidth all tend to increase. This means a generation upgrade can sometimes resolve a bottleneck without increasing 
instance size — and a generation upgrade is always cheaper than going up a size.*

💡 *For the 8th generation specifically: all sizes from large upwards have a baseline EBS bandwidth exceeding `gp3`’s 
3,000 IOPS — but you need to reach 2xlarge before the instance can fully utilize a ≥400GB gp3 volume’s 12,000 IOPS 
baseline. Worth keeping in mind when matching instance size to storage configuration.*

#### Using latency metrics

If `TotalIOPS` is well below the instance and volume limits but `ReadLatency` or `WriteLatency` is consistently high — 
that's the scenario where io2's sub-millisecond consistency starts to make sense. The problem isn't volume, it's 
predictability.

Conversely, high latency combined with high TotalIOPS points to a capacity problem: you're saturating the instance 
pipe, the volume, or both. io2 won't help there.

Honest note: this specific pattern (high latency, low IOPS) is rare in practice. We haven't seen enough cases to 
publish a concrete latency threshold. Treat it as a diagnostic direction rather than a hard rule.

### What to watch

* Start with `TotalIOPS` — no need to split read/write initially.  
* If `TotalIOPS` is consistently high, check instance limits before touching EBS configuration.  
* Check `ReadLatency` / `WriteLatency` as a secondary signal, particularly if IOPS looks fine but queries are slow.  
* Throughput metrics are worth a glance for completeness, but don't expect them to be the bottleneck.

## Database Connections

This one is almost too simple to mention — and yet it’s surprisingly easy to overlook, especially with read replicas. 
Check the DatabaseConnections metric.

If an instance has registered zero connections — or suspiciously few — over a month-long observation window, 
it’s an obvious signal that the instance is unused. In most cases that means it’s unnecessary and can be removed. 
Occasionally it means there’s a misconfiguration somewhere in the services pointing at it, which is also worth knowing.

This matters especially for read replicas. Unlike Aurora (which provides a built-in reader endpoint), standard RDS 
gives each read replica its own address. Without an RDS Proxy or similar in front of them, traffic distribution is 
entirely up to whoever configured the application. If one replica is idle while others are struggling, that’s a routing 
problem worth investigating before drawing any conclusions about whether you need more or fewer replicas.

The most interesting — and surprisingly common — scenario is a master with one read replica where the replica receives 
zero connections while the master is working flat out. This almost always means the application has a single endpoint 
configured for both reads and writes. You can confirm this by checking ReadIOPS and ReadThroughput on the master: some 
read activity is expected (replication itself requires reads), but if read metrics are clearly and consistently higher 
than their write counterparts, the master is doing everything.

In that situation, there are two reasonable paths:

* Remove the replica. It’s doing nothing except costing money.  
* Fix the root cause: route reads to the replica as originally intended, let things run for a while, then revisit 
  the master’s metrics. There’s a reasonable chance the master, now handling only writes, becomes a candidate for 
  downsizing too.

## Network Throughput

Standard CloudWatch provides NetworkReceiveThroughput and NetworkTransmitThroughput. For most RDS setups, 
client-to-database network traffic is lighter than database-to-disk traffic, so network is rarely the bottleneck.

That said, it can become one — particularly in cases with high read replica traffic, large result sets, or data-intensive integrations. If you're seeing unexplained performance issues and IOPS/CPU/memory all look healthy, it's worth checking whether the instance's network interface is saturated. Instance network limits are in the same documentation table as EBS limits.

## Where to Go From Here

This guide covers the fundamentals: the metrics available in standard CloudWatch, what they mean, and how to use them to make reasonable sizing decisions without over-engineering the analysis.

What this guide deliberately skips:

* Aurora-specific architecture and its different scaling model  
* Cache hit ratio analysis (requires Enhanced Monitoring or Prometheus)  
* Parameter group tuning and its impact on memory behavior  
* Advanced proxy and routing solutions (RDS Proxy deep-dive, pgpool-II)  
* Unlimited burst cost calculations for t-family instances (beyond the breakeven overview above)  
* Exotic EC2 instance modifiers and their niche use cases

Most of those topics deserve their own write-up. The metrics covered here are enough to identify the obvious wins — and in a fleet of any meaningful size, there are usually quite a few of those.