how-the-postgresql-query-planner-chooses-between-an-index-sc

---
title: "How the PostgreSQL Query Planner Chooses Between an Index Scan and a Seq Scan in Under a Millisecond"
description: "An inside look at the cost model that drives PostgreSQL's scan decisions, using only statistics and fast arithmetic."
tags: ["postgresql", "query-planner", "index-scan", "seq-scan", "internals"]
date: "2026-06-24"
---

The planner does not contain a rule like “use an index if fewer than 5% of rows match.” That number is a myth. What you see as a hard threshold is actually the shadow of three cost constants, a correlation statistic, and a lot of fast multiplication.

PostgreSQL weighs a sequential scan against an index scan the same way you would choose between reading an entire book and flipping through a card catalog. The decision happens in microseconds because it never touches a disk. It never estimates cache warmth. It never simulates I/O timing. It just multiplies a few integers and compares the results. That sounds simplistic. It is. And it is remarkably effective.

## A Bookshelf Analogy

Imagine a library where all the books sit on endless shelves in no particular order. Each shelf holds exactly one 8 KB page of book content. You have a question: find every book that mentions “wormholes.” You can do this two ways.

**Method A:** Start at the first shelf and read every page. You move physically through the library in order. The assistant pushes a cart and catches up with you as you go. On a spinning disk this is fast because the shelf order matches the platter order. On an SSD it is still fast because the OS can issue huge read requests.

**Method B:** Consult a card catalog that lists every occurrence of “wormholes” sorted by page number. The catalog is organized as a B‑tree. You find the first entry in a few tree descents and then scan a contiguous range of cards. That part is cheap. But then you must go to each referenced shelf. If the pages corresponding to “wormholes” are scattered across 4,000 different shelves, you’ll be running all over the library. The catalog told you where the books are but not that the walk will be painful.

That physical layout problem is what PostgreSQL models with the **correlation** statistic. Correlation measures how closely the heap’s physical order matches the index’s logical order. When correlation is high, those 4,000 pages sit close together on the shelves, and the walk feels almost like a tour. When it is low, the walk is random and brutal.

The planner turns this physical intuition into a handful of unitless cost numbers. No wall‑clock time. No buffer cache simulation. Just guesses that are right often enough.

## The Cost Model in One Sentence

PostgreSQL uses a **cost‑based optimizer**, not a rule‑based one. For every table in a query, it generates a small set of candidate access paths, computes a synthetic cost for each, and keeps only the best.[1][2][8][13]

### Where the Numbers Come From

The planner reads two kinds of state: statistics and configuration.

**Statistics** are stored in `pg_class` (relation‑level) and `pg_statistic` (per‑column).[4][7] The two values that matter most for scan costing are:

- `relpages` – the total number of disk pages the table occupies
- `reltuples` – the estimated row count

The `ANALYZE` command refreshes these numbers by sampling up to 30,000 rows by default, even for terabyte tables.[7] It also builds per‑column histograms, most‑common‑value lists, and the correlation coefficient. All of this comes from sampling, not a full scan. The planner trusts these numbers because cost comparisons are surprisingly tolerant of estimation errors: being off by an order of magnitude in row count rarely flips a plan from an obviously bad one to a good one.[4][7]

**Configuration** is four global cost constants that let you describe your hardware to the planner:[2][6]

| Constant | Default | Meaning |
|---|---|---|
| `seq_page_cost` | 1.0 | Cost of reading one page sequentially |
| `random_page_cost` | 4.0 | Cost of a random page fetch |
| `cpu_tuple_cost` | 0.01 | CPU cost to process one row |
| `cpu_operator_cost` | 0.0025 | CPU cost to evaluate one operator |
| `cpu_index_tuple_cost` | 0.005 | CPU cost to process one index entry |

The defaults model spinning disks: reading a page randomly costs four times as much as reading it in a stream. On an SSD you would lower `random_page_cost` toward 1.0. The interplay between these numbers is what creates the illusion of a percentage threshold.

### Enumerating the Paths

When a query reaches the optimizer, the planner first constructs a **sequential scan path** for each table. It then iterates over every index and decides which ones can satisfy part of the `WHERE` clause.[13] For each usable index it may generate:

- A plain index scan that fetches every matching heap tuple.
- A bitmap index scan that collects tuple IDs into a bitmap and then visits heap pages in physical order.[10]
- An index‑only scan if the index covers all columns needed by the query and the visibility map allows skipping heap pages.[7][10]

Each path carries a startup cost (work before the first row can be returned) and a total cost (all work to finish the scan). A query with `LIMIT` may prefer a path with low startup cost even if its total cost is higher, because the planner can interpolate an “effective cost” for only the required prefix of rows.[11]

The planner prunes paths aggressively. A new path is kept only if it is not dominated: no existing path has both lower total cost and lower startup cost while offering at least as good an output ordering.[13] For a single‑table query, the final set of candidates is typically a sequential scan plus one or two index scans. This pruning keeps the search space tiny and the planning time far below a millisecond.

## How a Sequential Scan Is Costed

A sequential scan in PostgreSQL has zero startup cost. The run cost is the simple sum of disk I/O and CPU for every row in the table.[8] Implementation‑wise it looks like the pseudocode in `cost_seqscan`:[15]

run_cost = cpu_per_tuple * ntuples

seq_page_cost * npages; total_cost = startup_cost + run_cost;


`cpu_per_tuple` is `cpu_tuple_cost + cpu_operator_cost` times the number of filter operators. For a 5‑million‑row table on 10,000 pages with default constants that gives roughly 72,500 cost units. The number itself is meaningless. What matters is how it compares to the numbers produced by the index paths.

## How an Index Scan Is Costed

The index cost estimation has three layers: index access, heap access, and correlation blending.[5]

### Index Access

The planner first computes `indexSelectivity`, the fraction of table rows that will satisfy the index conditions.[4][5] It does this by calling `clauselist_selectivity` on the predicate list. For an equality predicate on a column with 1,000 distinct values, the selectivity would be about 0.001.

From that fraction the planner estimates how many index tuples and index pages will be visited. Index pages are assumed to be read sequentially, so their I/O cost is `seq_page_cost * indexPagesVisited`. CPU cost for processing each index entry is `cpu_index_tuple_cost` plus the eval cost of the index conditions.[5] This yields the index‑specific part of the startup and total cost.

### Heap Access

The heart of the decision lives in the heap access cost. The planner must estimate how many distinct heap pages will be touched and how expensive those touches are.

The worst‑case assumption is that every index entry points to a different heap page, and each fetch is a random access. In that world the I/O cost would be `random_page_cost * estimatedHeapPages`. But if the row order is correlated with the index order, the same heap pages are visited repeatedly or in sequence. PostgreSQL models this by interpolating between two extremes: perfectly sequential page access (cost ≈ `seq_page_cost` per page) and fully random access (cost ≈ `random_page_cost` per page).[16]

The interpolation uses the square of the correlation coefficient, `indexCorrelation²`.[16] A correlation of 0.9 gives 0.81 weight toward the sequential side; a correlation of 0.1 gives almost zero weight. This is why a table built with `CREATE INDEX ... (time_column)` after a `CLUSTER` can turn an expensive index scan into the planner’s favorite: high correlation shrinks the effective heap I/O cost dramatically.

### Putting It Together

The index scan total cost is the sum of index I/O, index CPU, heap I/O, and heap CPU. Heap CPU is charged only on the matching rows at `cpu_tuple_cost` per row. The startup cost includes descending the B‑tree to the first leaf page, which for a tiny selectivity can be non‑trivial relative to a sequential scan.

In the earlier 5‑million‑row example, if the query selects 5,000 rows, they fall on 4,000 distinct heap pages, and correlation is low, the heap I/O alone costs about `4.0 × 4000 = 16,000`. That already exceeds the entire sequential scan’s disk cost of 10,000. Add the index overhead and the planner chooses the sequential scan despite the tiny row ratio.

## Bitmap and Index‑Only Scans

The planner does not just toggle between seq scan and plain index scan. Two other paths matter.

A **bitmap index scan** builds an in‑memory bitmap of all matching tuple IDs from one or more indexes, sorts them by physical page, and then reads each heap page exactly once.[10] It incurs bitmap construction overhead but avoids repeated random fetches. The cost model charges `cpu_operator_cost` per bitmap entry and then `seq_page_cost` per heap page visited. This strategy often wins when a plain index scan would be too expensive but the row count is still low enough that scanning the whole table is overkill.

An **index‑only scan** goes further: if the index contains every column the query needs, the planner may never touch the heap at all. Heap pages are fetched only when the visibility map says that some tuples on a page are not all‑visible. The cost model subtracts the heap I/O that would otherwise be required. That makes an index‑only scan dramatically cheaper and often the winning path for moderate selectivity even when correlation is poor.

## Why Under a Millisecond

The entire decision runs as basic arithmetic on a handful of cached catalog tuples. There is no internal measurement of disk latency. No probing of the buffer pool. No sampling of the actual rows at planning time. The functions `cost_seqscan`, `cost_index`, and `cost_bitmap_heap_scan` in [costsize.c](https://github.com/postgres/postgres/blob/master/src/backend/optimizer/path/costsize.c) are just a few dozen lines of multiplication and addition each. The optimizer can compare several paths, prune them, and hand the winner to the executor in the single‑digit microsecond range for simple queries.

When the planner is wrong, it is usually wrong for one of three reasons: stale statistics, a mismatch between the cost constants and the real I/O subsystem, or correlation that has drifted since the last `ANALYZE`. None of these require a rewrite of the cost model. Just an `ANALYZE` or a gentle tuning of `random_page_cost`.

## Quick Reference

| Property | Value |
|---|---|
| Default `seq_page_cost` | 1.0 |
| Default `random_page_cost` | 4.0 |
| Default `cpu_tuple_cost` | 0.01 |
| Default `cpu_index_tuple_cost` | 0.005 |
| Default `cpu_operator_cost` | 0.0025 |
| `ANALYZE` default sample size | 30,000 rows |
| Page size | 8 KB |
| Correlation coefficient range | -1.0 to +1.0 |
| Planner source file | `src/backend/optimizer/path/costsize.c` |
| Statistics tables | `pg_class`, `pg_statistic` |

## Frequently Asked Questions

**Q: The planner keeps choosing a sequential scan even though my query returns exactly one row. Why?**</br>
Usually `random_page_cost` is too high for your storage. On SSDs, set it to 1.0–1.5. Also check that the index correlation is not near zero; an index on a purely random key will look expensive. Run `ANALYZE` to refresh statistics.

**Q: Can I force the planner to use an index without rewriting the query?**</br>
You can temporarily set `enable_seqscan = off` but that is a hammer. Better to adjust `random_page_cost` or use the `pg_hint_plan` extension to attach explicit hints. PostgreSQL has no built‑in planner hints.

**Q: What does the correlation statistic actually measure?**</br>
It is the Pearson correlation between the logical row order of the index and the physical tuple order on disk. A value of 0.9 means that scanning the index in order visits heap pages almost sequentially. A value near 0 means each index entry sends you to a random page.

**Q: How does the planner handle complex `WHERE` clauses with columns from multiple indexes?**</br>
It can combine indexes using a bitmap scan that builds separate bitmaps and then ANDs or ORs them. The cost model estimates selectivity for each predicate separately and assumes independence when combining them, unless extended statistics tell it otherwise.

**Q: Why does a `LIMIT 10` query sometimes use a sequential scan instead of an index?**</br>
If the startup cost of the index scan plus the cost of retrieving the first ten rows exceeds the cost of the full sequential scan, the planner still goes with the seq scan. The planner is total‑cost‑oriented unless the `LIMIT` is small enough and the index has the right ordering to make the “fraction‑of‑path” cost lower than the seq scan’s total. Tuning `random_page_cost` can tip the balance.

If you want this kind of breakdown every week — not marketing fluff but how real systems actually work under the hood — subscribe to Internals Decoded at [internalsdecoded.com](https://internalsdecoded.com). Next time you stare at an unexpected seq scan, you will know exactly which knob to turn and which statistic to check.

## Sources

- [PostgreSQL Documentation: Planner/Optimizer](https://www.postgresql.org/docs/current/planner-optimizer.html)
- [PostgreSQL Documentation: Index Cost Estimation Functions](https://www.postgresql.org/docs/current/index-cost-estimation.html)
- [PostgreSQL Documentation: Statistics Used by the Planner](https://www.postgresql.org/docs/current/planner-stats.html)
- [PostgreSQL Documentation: Runtime Configuration – Query Planning](https://www.postgresql.org/docs/current/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS)
- [costsize.c on PostgreSQL master](https://github.com/postgres/postgres/blob/master/src/backend/optimizer/path/costsize.c)
- [Discussion on correlation and index costing interpolation](https://www.postgresql.org/message-id/flat/5700.1565190139%40sss.pgh.pa.us)