The migration was small. A NOT NULL constraint on a column added in February.
The team had high confidence: tests passed, code review was clean, the deploy plan had been rehearsed.
Production had 14 million rows in that table.
Our test suite did not catch this because the suite used a mock that recorded "constraint added" and moved on. Postgres, on real data, was rewriting half a billion bytes of disk while the API timed out. The mock layer was honest about syntax. It lied about behaviour.
Mocks are useful. They let you test fast and isolate logic. But in any system that handles money, the database is not a dependency you can mock around.
Since that morning, every engagement we sign starts with the same five rules.
1. Run the real database, locally and in CI
H2 standing in for Postgres. SQLite simulating MySQL. Mock layers that record SQL strings and move on. All of them lie about constraint behaviour, index plans, transaction isolation, and timezone handling.
The cost of running real Postgres in a container is roughly one minute of CI time per run. The cost of not running it is six figures and a board meeting.
2. Fresh container per test run
Shared databases drift. A migration applied in March affects a test written in June. The fix is boring: spin up an isolated container, apply your migrations from zero, run the suite, throw it away.
If your migration pipeline can't survive that, it can't survive a real outage either.
3. Test the rollback, not just the apply
Most production migration failures we've seen happen at the rollback step, not the forward apply. The forward path is rehearsed. The rollback is written once by a tired engineer at 4pm on a Friday, and never run again.
Write the rollback test before you sign off on the migration. Run it in CI. If the rollback can't restore the original schema on real data, the migration isn't ready.
4. Match production exactly
Same extensions. Same encoding. Same collation. Same timezone. Same Postgres minor version, ideally.
We spent six hours in 2023 debugging an "intermittent" off-by-one on date filters. It was UTC drift between the test container and a production cluster pinned to IST. Six hours of senior engineers wondering if the bug was in the ORM. It was in the timezone the container started up with.
5. Run at realistic volume at least once
A test database with 500 rows hides query plans that fall over at 5 million. You don't need a million rows in every CI run, but you need at least one nightly job that does.
The query plan that wins at 500 rows is rarely the same query plan that wins at production scale. Postgres' planner is doing its job. It just doesn't have the data to do it the way production will.
None of this makes tests fast
It makes them honest.
In fintech, "we're sorry, our tests passed" is not a defence the regulator accepts. Build the test suite the regulator would write.