{"id":402,"date":"2026-03-09T13:21:48","date_gmt":"2026-03-09T13:21:48","guid":{"rendered":"https:\/\/blog.rebalai.com\/en\/2026\/03\/09\/postgresql-performance-tuning-what-i-learned-optim\/"},"modified":"2026-03-18T22:00:06","modified_gmt":"2026-03-18T22:00:06","slug":"postgresql-performance-tuning-what-i-learned-optim","status":"publish","type":"post","link":"https:\/\/blog.rebalai.com\/en\/2026\/03\/09\/postgresql-performance-tuning-what-i-learned-optim\/","title":{"rendered":"PostgreSQL Performance Tuning on a 10TB Database: What Actually Worked"},"content":{"rendered":"<p>Our PostgreSQL cluster crossed 10TB sometime in late 2024. I remember because that was also the week our reporting service started returning HTTP 504s intermittently, and the on-call rotation became a lot less fun. Seven engineers on the team, one database, and a growing pile of angry Slack messages from the analytics team.<\/p>\n<p>This is <a href=\"https:\/\/blog.rebalai.com\/en\/2026\/03\/08\/rag-deep-dive-chunking-strategies-vector-databases\/\" title=\"What I Learned\">what I learned<\/a> over the next two <a href=\"https:\/\/blog.rebalai.com\/en\/2026\/03\/09\/deno-20-in-production-2026-migration-from-nodejs-a\/\" title=\"Months of\">months of<\/a> digging.<\/p>\n<h2>How a 40-Second Query Killed Our Reporting Service<\/h2>\n<p>The first sign was a query in our reporting pipeline that had been running in under 500ms for months \u2014 suddenly averaging 40 seconds. Not always. Just&#8230; sometimes. Which is almost worse than always, because intermittent slowness is genuinely hard to reproduce.<\/p>\n<p>My first instinct was &#8220;missing index.&#8221; Wrong. Or \u2014 okay, not entirely wrong, but that wasn&#8217;t the root cause. The table in question (<code>user_events<\/code>, ~800M rows, partitioned by month) had indexes. The query planner was using them. But the <code>EXPLAIN ANALYZE<\/code> output showed something I hadn&#8217;t seen before at this scale:<\/p>\n<pre><code class=\"language-sql\">EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)\nSELECT\n  user_id,\n  event_type,\n  COUNT(*) AS event_count\nFROM user_events\nWHERE\n  created_at &gt;= '2024-10-01'\n  AND created_at &lt; '2024-11-01'\n  AND event_type = ANY(ARRAY['click', 'purchase', 'view'])\nGROUP BY user_id, event_type;\n<\/code><\/pre>\n<pre><code>Index Scan using idx_user_events_created_at on user_events_2024_10\n  (cost=0.57..892341.22 rows=2847293 width=24)\n  (actual time=0.843..38421.337 rows=2847293 loops=1)\n  Buffers: shared hit=12847 read=1829403 dirtied=0 written=0\n  I\/O Timings: read=36891.442 write=0.000\n<\/code><\/pre>\n<p>See that <code>read=1829403<\/code> vs <code>hit=12847<\/code>? Buffer hit ratio under 1%. Everything was going to disk. But the index existed and was supposedly being used, so I spent two days assuming the problem was <code>work_mem<\/code> or <code>shared_buffers<\/code> being too low. Tuned both. Query was still slow.<\/p>\n<p>The actual culprit was index bloat from dead tuples. Autovacuum hadn&#8217;t kept up with a particularly write-heavy period a few weeks earlier, and the index had swelled with dead entries. The planner was traversing enormous amounts of garbage to reach live rows.<\/p>\n<h2>What pg_stat_statements Actually Told Me (vs. <a href=\"https:\/\/blog.rebalai.com\/en\/2026\/03\/09\/docker-compose-vs-kubernetes-when-to-use-which-in\/\" title=\"What I\">What I<\/a> Expected)<\/h2>\n<p>Before I arrived <a href=\"https:\/\/blog.rebalai.com\/en\/2026\/03\/09\/cloudflare-workers-vs-aws-lambda-which-edge-runtim\/\" title=\"at the\">at the<\/a> autovacuum diagnosis, I spent a few days inside <code>pg_stat_statements<\/code>. If you&#8217;re not already querying this regularly, you&#8217;re missing probably the most useful thing Postgres ships with.<\/p>\n<pre><code class=\"language-sql\">SELECT\n  query,\n  calls,\n  round(mean_exec_time::numeric, 2)   AS mean_ms,\n  round(total_exec_time::numeric, 2)  AS total_ms,\n  round(stddev_exec_time::numeric, 2) AS stddev_ms,\n  rows \/ NULLIF(calls, 0)             AS avg_rows\nFROM pg_stat_statements\nWHERE calls &gt; 100\nORDER BY total_exec_time DESC\nLIMIT 20;\n<\/code><\/pre>\n<p>A few things surprised me. The query I was fixated on \u2014 the 40-second monster \u2014 wasn&#8217;t even <a href=\"https:\/\/blog.rebalai.com\/en\/2026\/03\/08\/rag-deep-dive-chunking-strategies-vector-databases\/\" title=\"in the\">in the<\/a> top 5 by total execution time. Three other queries were collectively burning more CPU, each running thousands of times per day. Short calls, individually fast, but hammering the database constantly from a background job polling every 15 seconds. Nobody had noticed because no single call looked alarming.<\/p>\n<p>So the reporting crisis was user-visible and urgent. But the background polling was quietly degrading our read replicas, and I hadn&#8217;t known about it until I looked here.<\/p>\n<p>The <code>stddev_exec_time<\/code> column is something I now look at almost more than mean. A query with 200ms mean and 800ms stddev concerns me more than a flat 300ms mean. High variance usually signals something inconsistent \u2014 sometimes cache-warm, sometimes cold, sometimes hitting lock contention.<\/p>\n<p>One thing I noticed after a planned failover: <code>pg_stat_statements<\/code> resets on PostgreSQL restart unless you&#8217;ve set <code>pg_stat_statements.save = on<\/code>. We hadn&#8217;t. I lost <a href=\"https:\/\/blog.rebalai.com\/en\/2026\/03\/08\/rag-deep-dive-chunking-strategies-vector-databases\/\" title=\"Two Weeks of\">two weeks of<\/a> history and had to start fresh. Set that to <code>on<\/code> immediately.<\/p>\n<h2>Why My Indexes Were Wrong, <a href=\"https:\/\/blog.rebalai.com\/en\/2026\/03\/09\/deno-20-in-production-2026-migration-from-nodejs-a\/\" title=\"and What\">and What<\/a> Index Bloat Actually Does to You<\/h2>\n<p>Here is the thing about partial indexes: I knew they existed. I&#8217;d used them on smaller databases. But at 800M rows, the difference between a full index and a well-scoped partial index is enormous \u2014 in query time and in ongoing maintenance cost.<\/p>\n<p>We had:<\/p>\n<pre><code class=\"language-sql\">CREATE INDEX idx_user_events_event_type\n  ON user_events (event_type, created_at);\n<\/code><\/pre>\n<p>When realistically 90% of our queries only touched the last 90 days and a handful of event types. The index covered three <a href=\"https:\/\/blog.rebalai.com\/en\/2026\/03\/05\/advanced-prompt-engineering-techniques-chain-of-th\/\" title=\"Years of\">years of<\/a> data we almost never queried. After analyzing access patterns, I replaced it with:<\/p>\n<pre><code class=\"language-sql\">CREATE INDEX CONCURRENTLY idx_user_events_recent_active\n  ON user_events (event_type, user_id, created_at)\n  WHERE created_at &gt;= '2024-01-01'\n    AND event_type IN ('click', 'purchase', 'view', 'signup');\n<\/code><\/pre>\n<p>Smaller index, faster scans, less bloat surface. Query time on the reporting query dropped from 40 seconds to about 1.2 seconds on the first run. I was genuinely shocked \u2014 I&#8217;d been expecting maybe a 50% improvement, not 97%.<\/p>\n<p>One warning: I pushed this on a Friday afternoon. Specifically, a Friday at 4pm. <code>CREATE INDEX CONCURRENTLY<\/code> on a large table runs for hours (ours took 3.5), and while it avoids an exclusive lock on the table for most of that time, it does acquire locks during the final phase. It also hammers disk I\/O during construction. Production slowed noticeably for about 20 minutes near the end. The analytics team filed a P2. <a href=\"https:\/\/blog.rebalai.com\/en\/2026\/03\/08\/rag-deep-dive-chunking-strategies-vector-databases\/\" title=\"I Learned\">I learned<\/a> my lesson about Friday deployments the hard way.<\/p>\n<p>For the bloat specifically \u2014 I diagnosed it with <code>pgstattuple<\/code>:<\/p>\n<pre><code class=\"language-sql\">SELECT * FROM pgstattuple('idx_user_events_created_at');\n<\/code><\/pre>\n<p>The <code>dead_leaf_percent<\/code> was sitting at 31%. Anything over roughly 10% is worth addressing. A <code>REINDEX CONCURRENTLY<\/code> brought it down under 2% and immediately improved our buffer hit ratio. That single operation recovered most of the query time regression before I&#8217;d even touched the index structure itself.<\/p>\n<h2>Autovacuum Was the Thing Nobody on My Team Wanted to Talk About<\/h2>\n<p>Autovacuum is mentioned in every PostgreSQL tutorial. Most teams then ignore it until something breaks. We ignored it. Something broke.<\/p>\n<p><code>user_events<\/code> was receiving about 2 million writes per day. The default autovacuum settings \u2014 sized for much smaller tables \u2014 couldn&#8217;t keep up. Dead tuples accumulated, bloated the indexes, and eventually pushed the planner toward worse choices.<\/p>\n<p>The fix was per-table autovacuum configuration:<\/p>\n<pre><code class=\"language-sql\">ALTER TABLE user_events SET (\n  autovacuum_vacuum_scale_factor = 0.01,  -- trigger when 1% of rows are dead (default 0.2)\n  autovacuum_analyze_scale_factor = 0.005,\n  autovacuum_vacuum_cost_delay = 2        -- less sleep between pages; default is 20ms\n);\n<\/code><\/pre>\n<p>The default <code>autovacuum_vacuum_scale_factor<\/code> of 0.2 on an 800M-row table means autovacuum doesn&#8217;t trigger until 160 million dead rows have accumulated. That&#8217;s completely unworkable for a write-heavy table. Dropping it to 0.01 triggers at 8 million rows \u2014 still substantial, but manageable. Combined with bumping <code>autovacuum_max_workers<\/code> from 3 to 6 globally, the bloat stabilized within a week.<\/p>\n<p>I&#8217;m not 100% sure this exact configuration scales beyond our specific write pattern. If your table takes large bulk deletes rather than steady row churn, the math changes. Your mileage may vary.<\/p>\n<p>What genuinely caught me off guard: none of our monitoring dashboards were surfacing any of this. We had query time alerts, CPU alerts, disk alerts. We had zero visibility into <code>n_dead_tup<\/code> on our largest tables. I added a Grafana panel pulling from <code>pg_stat_user_tables<\/code> \u2014 <code>n_dead_tup<\/code>, <code>last_autovacuum<\/code>, <code>n_tup_del<\/code> \u2014 and within two days it revealed three other tables with similar bloat building up, none of which had degraded yet into visible slowness.<\/p>\n<p>That dashboard has saved us more than any other single change from this whole project.<\/p>\n<h2>What I&#8217;d Do Differently Starting from Zero<\/h2>\n<p>The biggest mistake wasn&#8217;t any specific technical choice. It was treating database performance as a reactive problem \u2014 queries slowed, we investigated; bloat grew, we responded. We had no proactive visibility into what was quietly building up.<\/p>\n<p>Honest answer: I&#8217;d flip the order of everything <a href=\"https:\/\/blog.rebalai.com\/en\/2026\/03\/09\/docker-compose-vs-kubernetes-when-to-use-which-in\/\" title=\"I Actually\">I actually<\/a> did. The Grafana panel pulling from <code>pg_stat_user_tables<\/code> should have been week one, not week eight. Slow query logging should have been set to 500ms from the start, not 5 seconds \u2014 at 5 seconds you&#8217;re only seeing things that are already on fire. At 500ms you catch them while they&#8217;re still cheap to fix. I knew this in theory. I didn&#8217;t actually do it.<\/p>\n<p><code>pg_stat_statements<\/code> should be persisted across restarts and reviewed weekly \u2014 by a human, not just alert rules. Alerts catch acute failures. The queries that gradually degrade over <a href=\"https:\/\/blog.rebalai.com\/en\/2026\/03\/09\/bun-vs-nodejs-in-production-2026-real-migration-st\/\" title=\"Three Months\">three months<\/a>, barely over your SLA at first and then slowly worse, only show up if someone actually sits down and looks at <code>stddev_exec_time<\/code> outliers on a schedule.<\/p>\n<p>On indexes: treat them as a liability until proven otherwise. Every index is a maintenance contract autovacuum has to honor on every write. We dropped 14 of the 40+ indexes on <code>user_events<\/code> and write performance measurably improved \u2014 which still surprises me a little when I think back on it. Model access patterns first, use partial indexes for tables with temporal skew, and resist the urge to index your way out of a slow query before you&#8217;ve checked everything else.<\/p>\n<p>And get comfortable with <code>EXPLAIN (ANALYZE, BUFFERS)<\/code>, not just <code>EXPLAIN<\/code>. Buffer hit statistics \u2014 shared hits versus reads from disk \u2014 usually explain more than the row estimates do. Row estimates are frequently wrong anyway; buffer stats are ground truth.<\/p>\n<p>We&#8217;re now running at roughly 95ms p95 on the reporting queries that were timing out four months ago. Same hardware, no infrastructure changes. Just profiling, index surgery, and finally taking autovacuum seriously.<\/p>\n<p><!-- Reviewed: 2026-03-09 | Status: ready_to_publish | Changes: restructured parallel \"What I'd Do Differently\" section for human flow; added \"Honest answer:\" opener and first-person reflection; merged redundant monitoring paragraphs; tightened conclusion; removed AI-tell list structure throughout final section --><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Our PostgreSQL cluster crossed 10TB sometime in late 2024.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[1],"tags":[],"class_list":["post-402","post","type-post","status-publish","format-standard","hentry","category-general"],"_links":{"self":[{"href":"https:\/\/blog.rebalai.com\/en\/wp-json\/wp\/v2\/posts\/402","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.rebalai.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.rebalai.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.rebalai.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.rebalai.com\/en\/wp-json\/wp\/v2\/comments?post=402"}],"version-history":[{"count":5,"href":"https:\/\/blog.rebalai.com\/en\/wp-json\/wp\/v2\/posts\/402\/revisions"}],"predecessor-version":[{"id":530,"href":"https:\/\/blog.rebalai.com\/en\/wp-json\/wp\/v2\/posts\/402\/revisions\/530"}],"wp:attachment":[{"href":"https:\/\/blog.rebalai.com\/en\/wp-json\/wp\/v2\/media?parent=402"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.rebalai.com\/en\/wp-json\/wp\/v2\/categories?post=402"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.rebalai.com\/en\/wp-json\/wp\/v2\/tags?post=402"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}