Fundamentals Latency vs Throughput

Latency vs Throughput

Fast single requests and high volume are related—but optimizing one can hurt the other.

Latency

Time for a single request to complete (p50/p95/p99). Users feel tail latency most.

Throughput

Work completed per unit time (requests/sec, bytes/sec). Often limited by CPU, IO, or contention.

Queueing

As utilization approaches 100%, queues build and latency spikes. Keep headroom for bursts.

Backpressure

Push back on producers (shed load, rate limit, bounded queues) to protect downstream systems.

Practical Tips

  • Track p50/p95/p99; p99 drives user experience and incident pain.
  • Bound queues and add timeouts; unbounded retries create latency collapse.
  • Use load shedding when overloaded to keep the system responsive for some users.
  • Prefer idempotent operations so retries don’t amplify failures.