【CSDN 編者按】在數(shù)據(jù)庫(kù)性能優(yōu)化成為常態(tài)的今天,本文作者反其道而行之,用近乎“行為藝術(shù)”的方式展現(xiàn)了 Postgres 配置的另一面 —— 通過(guò)調(diào)整postgresql.conf
中的 32 個(gè)參數(shù),讓數(shù)據(jù)庫(kù)性能暴跌 42000 倍。
原文鏈接:https://byteofdev.com/posts/making-postgres-slow/
作者 | Jacob Jackson
翻譯 | 鄭麗媛
出品 | CSDN(ID:CSDNnews)
投稿或?qū)で髨?bào)道 | zhanghy@csdn.net
相信大家都在琢磨怎么讓 Postgres 更快、更高效之類的,但有沒有人想過(guò):要是想讓它變得盡可能地慢,該怎么做?
當(dāng)然,我知道多數(shù)人研究提速都是為了拿工資的,可我失業(yè)了,目前并沒有工作。原本呢,我是在寫一篇正經(jīng)的 Postgres 性能調(diào)優(yōu)指南的,結(jié)果腦洞一開:要不試試搞個(gè)“反優(yōu)化”版本,把 Postgres 配置成 “盡可能慢地處理查詢” 模式?
總歸我目前也沒有工作,就這么隨意造一次吧!
參數(shù)限制說(shuō)明
我不能讓這件事變得太簡(jiǎn)單。整體來(lái)說(shuō),這是一個(gè) Postgres 參數(shù)調(diào)優(yōu)的挑戰(zhàn),不是那種把 CPU 降到 1MHz 或刪掉所有索引的簡(jiǎn)單做法,所有改動(dòng)都得是postgresql.conf里的參數(shù)調(diào)整。此外,數(shù)據(jù)庫(kù)還得能在合理時(shí)間內(nèi)至少處理完一個(gè)事務(wù)——直接讓 Postgres 徹底卡死也太沒技術(shù)含量了。
所以,這件事其實(shí)比看起來(lái)更難,因?yàn)?Postgres 通過(guò)設(shè)置限制和簡(jiǎn)化配置,會(huì)盡可能避免用戶犯這種“愚蠢的決定”。
為了測(cè)性能,我用了 Benchbase 實(shí)現(xiàn)的 TPC-C 測(cè)試,配置 128 個(gè)倉(cāng)庫(kù),100 個(gè)連接(每個(gè)連接嘗試以 10,000 TPS 的速率輸出事務(wù)),后臺(tái)數(shù)據(jù)庫(kù)用 Postgres 19devel,系統(tǒng)是 Linux 6.15.6,硬件為 Ryzen 7950x 處理器、32GB 內(nèi)存和 2TB SSD。
每輪測(cè)試持續(xù) 120 秒,分兩次執(zhí)行:第一次預(yù)熱緩存,第二次收集數(shù)據(jù)。
在基準(zhǔn)測(cè)試中,除了幾個(gè)常規(guī)調(diào)整(例如調(diào)高 shared_buffers、work_mem、worker 進(jìn)程數(shù)量)以外,其他所有配置我都保持默認(rèn)——結(jié)果 TPS(每秒處理的事務(wù)數(shù)量)達(dá)到了不錯(cuò)的 7082。
那么接下來(lái),就看看 Postgres 能被我折騰得有多慢吧。
緩存?算了吧……
Postgres 響應(yīng)讀取查詢時(shí)之所以高效,很大程度上歸功于強(qiáng)大的緩存機(jī)制。從磁盤讀數(shù)據(jù)很慢,所以 Postgres 每次從磁盤讀數(shù)據(jù)塊時(shí),都會(huì)把這個(gè)塊緩存到內(nèi)存里,這樣下次有查詢需要這個(gè)塊時(shí),就能直接從內(nèi)存讀取。
不過(guò),我的目標(biāo)是逼所有查詢都用最慢的讀取方式,所以這個(gè)緩存越小越好。
理論上來(lái)說(shuō),可以通過(guò) shared_buffers參數(shù)來(lái)控制緩存大小。但很可惜,不能把它直接設(shè)為 0,因?yàn)?Postgres 也要用它來(lái)處理活躍的數(shù)據(jù)庫(kù)頁(yè)。不過(guò)幸運(yùn)的是,我還是可以將它設(shè)得非常小。
首先,我試著把基準(zhǔn)測(cè)試中的 shared_buffers 從 10GB 降低到 8MB:
shared_buffers = 8MB
在這個(gè)設(shè)置下,TPS 下降到了 1052,僅為初始速度的 1/7:
緩沖區(qū)被縮小后,Postgres 在內(nèi)存中能保留的頁(yè)面變少,這意味著無(wú)需訪問(wèn)操作系統(tǒng)就能滿足的頁(yè)面請(qǐng)求比例從 99.90% 驟降到 70.52%,導(dǎo)致讀取系統(tǒng)調(diào)用的數(shù)量增加了近 300 倍。
但 70% 仍然太高,我們應(yīng)該還能做得更“好”,進(jìn)一步減小緩存空間。所以接下來(lái),我嘗試了縮小到 128kB,結(jié)果出現(xiàn)了錯(cuò)誤:
哎呀,128kB 的共享緩沖區(qū)最多只能存儲(chǔ) 16 個(gè)數(shù)據(jù)庫(kù)頁(yè)面(不算共享緩沖區(qū)中的其他內(nèi)容),而 Postgres 可能需要同時(shí)訪問(wèn)超過(guò) 16 個(gè)頁(yè)面。折騰了一番后,我發(fā)現(xiàn)最小可行值約為 2MB:
shared_buffers = 2MB
此時(shí),Postgres 的 TPS 已經(jīng)降到 500 以下了:
讓 Postgres盡量忙于“后臺(tái)瑣事”
Postgres 除了處理事務(wù)外,還有一些計(jì)算量不小的后臺(tái)任務(wù),我也可以用這些任務(wù)來(lái)制造拖延。為了減少存儲(chǔ)碎片,Postgres 會(huì)定期運(yùn)行 autovacuum(自動(dòng)清理)進(jìn)程,查找空閑空間(比如被刪除的數(shù)據(jù)留下的空間),并試圖重用。
一般情況下,為了避免影響性能,autovacuum 只有在發(fā)生一定數(shù)量的更改后才會(huì)運(yùn)行。但我可以通過(guò)調(diào)整參數(shù),讓它隨時(shí)觸發(fā):
autovacuum_vacuum_insert_threshold = 1 # autovacuum can be triggered with only 1 insert
autovacuum_vacuum_threshold = 0 # minimum number of inserts, updates, or deletes needed to trigger a vacuum
autovacuum_vacuum_scale_factor = 0 # proportion of the unfrozen table size to consider when calculating thresholds
autovacuum_vacuum_max_threshold = 1 # max number of inserts, updates, or deletes needed to trigger a vacuum
autovacuum_naptime = 1 # the minimum delay between autovacuums in seconds; unfortunately, this cannot be set below 1, which limits us
vacuum_cost_limit = 10000 # query cost limit, which, if exceeded, will cause the vacuum to pause; I don't want the vacuum to ever stop, so I maxed this out
vacuum_cost_page_dirty = 0
vacuum_cost_page_hit = 0
vacuum_cost_page_miss = 0 # all of these minimize the cost for operations when calculating for `vacuum_cost_limit`
我還調(diào)整了自動(dòng)分析器(autovacuum analyzer),這個(gè)模塊負(fù)責(zé)收集統(tǒng)計(jì)信息,用于后續(xù)的 vacuum 和查詢計(jì)劃(不過(guò),就算是準(zhǔn)確的統(tǒng)計(jì)信息也攔不住我胡搞):
autovacuum_analyze_threshold = 0 # same as autovacuum_vacuum_threshold, but for ANALYZE
autovacuum_analyze_scale_factor = 0 # same as autovacuum_vacuum_scale_factor
此外,我還得想辦法讓 vacuum 這個(gè)過(guò)程盡可能變慢:
maintenance_work_mem = 128kB # the amount of memory allocated for vacuuming processes
log_autovacuum_min_duration = 0 # the duration (in milliseconds) that a autovacuum operation is required to run for before it is logged; I might as well log everything;
logging_collector = on # enables logging in general
log_destination = stderr,jsonlog # sets the output format/file for logs
有一點(diǎn)需要說(shuō)明:完全禁用 autovacuum 也能讓性能慢慢下降,因?yàn)樗涝M逐漸會(huì)堆滿。但由于這次測(cè)試以插入數(shù)據(jù)為主、持續(xù)時(shí)間只有兩分鐘,所以我覺得這個(gè)方案不夠“慢”。
結(jié)果如何?TPS 降到了 293:
現(xiàn)在,Postgres 的性能已經(jīng)降到初始值的 1/20 以下了。我通過(guò)查看日志確認(rèn)了性能下降的原因:
2025-07-20 09:10:20.455 EDT [25210] LOG: automatic vacuum of table "benchbase.public.warehouse": index scans: 0
pages: 0 removed, 222 remain, 222 scanned (100.00% of total), 0 eagerly scanned
tuples: 0 removed, 354 remain, 226 are dead but not yet removable
removable cutoff: 41662928, which was 523 XIDs old when operation ended
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
visibility map: 0 pages set all-visible, 0 pages set all-frozen (0 were all-visible)
index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
avg read rate: 116.252 MB/s, avg write rate: 4.824 MB/s
buffer usage: 254 hits, 241 reads, 10 dirtied
WAL usage: 2 records, 2 full page images, 16336 bytes, 1 buffers full
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.01 s
2025-07-20 09:10:20.773 EDT [25210] LOG: automatic analyze of table "benchbase.public.warehouse"
avg read rate: 8.332 MB/s, avg write rate: 0.717 MB/s
buffer usage: 311 hits, 337 reads, 29 dirtied
WAL usage: 36 records, 5 full page images, 42524 bytes, 4 buffers full
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.31 s
2025-07-20 09:10:20.933 EDT [25210] LOG: automatic vacuum of table "benchbase.public.district": index scans: 0
pages: 0 removed, 1677 remain, 1008 scanned (60.11% of total), 0 eagerly scanned
tuples: 4 removed, 2047 remain, 557 are dead but not yet removable
removable cutoff: 41662928, which was 686 XIDs old when operation ended
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
visibility map: 0 pages set all-visible, 0 pages set all-frozen (0 were all-visible)
index scan bypassed: 2 pages from table (0.12% of total) have 9 dead item identifiers
avg read rate: 50.934 MB/s, avg write rate: 9.945 MB/s
buffer usage: 1048 hits, 1009 reads, 197 dirtied
WAL usage: 6 records, 1 full page images, 8707 bytes, 0 buffers full
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.15 s
2025-07-20 09:10:21.220 EDT [25210] LOG: automatic analyze of table "benchbase.public.district"
avg read rate: 47.235 MB/s, avg write rate: 1.330 MB/s
buffer usage: 115 hits, 1705 reads, 48 dirtied
WAL usage: 30 records, 1 full page images, 17003 bytes, 1 buffers full
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.28 s
2025-07-20 09:10:21.543 EDT [25212] LOG: automatic vacuum of table "benchbase.public.warehouse": index scans: 0
pages: 0 removed, 222 remain, 222 scanned (100.00% of total), 0 eagerly scanned
tuples: 0 removed, 503 remain, 375 are dead but not yet removable
removable cutoff: 41662928, which was 845 XIDs old when operation ended
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
visibility map: 0 pages set all-visible, 0 pages set all-frozen (0 were all-visible)
index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
avg read rate: 131.037 MB/s, avg write rate: 5.083 MB/s
buffer usage: 268 hits, 232 reads, 9 dirtied
WAL usage: 1 records, 0 full page images, 258 bytes, 0 buffers full
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.01 s
2025-07-20 09:10:21.813 EDT [25212] LOG: automatic analyze of table "benchbase.public.warehouse"
avg read rate: 10.244 MB/s, avg write rate: 0.851 MB/s
buffer usage: 307 hits, 337 reads, 28 dirtied
WAL usage: 33 records, 3 full page images, 30864 bytes, 2 buffers full
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.25 s
# ... it continues similarly
總之:現(xiàn)在 Postgres 幾乎每秒都在對(duì)熱點(diǎn)表執(zhí)行 autovacuum 和 analyze,緩存命中率又低,只能不停從磁盤讀取數(shù)據(jù)。更 “妙” 的是,因?yàn)槊枯喿兓疾淮?,這些操作幾乎是在做無(wú)用功。
但……我覺得 293 TPS 還是太高了。
把 Postgres 變成 Brandon Sanderson式的“高產(chǎn)寫手”
眾所周知,Brandon Sanderson是個(gè)高產(chǎn)作家。你知道誰(shuí)也可以“高產(chǎn)”嗎?——等我改完 WAL 配置,我的這套 Postgres 就是了。
在將更改提交到實(shí)際數(shù)據(jù)庫(kù)文件之前,Postgres 會(huì)先把這些更改寫入 WAL(預(yù)寫日志),然后通過(guò) checkpoint(檢查點(diǎn))將這些變更同步到磁盤。WAL 的可配置性很強(qiáng),這正好能為我所用。首先,Postgres 通常會(huì)在內(nèi)存中緩沖部分 WAL 數(shù)據(jù),然后批量寫入磁盤。但我可不能讓它這么高效:
wal_writer_flush_after = 0 # the minimum amount of WAL produced that requires a flush
wal_writer_delay = 1 # the minimum delay between flushes
我還想讓 WAL 盡可能頻繁地執(zhí)行檢查點(diǎn):
min_wal_size = 32MB # minimum WAL size after checkpointing; I want to checkpoint as much as possible
max_wal_size = 32MB # max WAL size, after which a checkpoint will happen. Unfortunately, I have to set both at 32MB minimum to match 2 WAL segments
checkpoint_timeout = 30 # max time between checkpoints in seconds; 30s is the minimum
checkpoint_flush_after = 1 # flush writes to disk after every 8kB
再最大化 WAL 寫入量:
wal_sync_method = open_datasync # the method of flushing to disk; this should be the slowest
wal_level = logical # makes the WAL output additional information for replication. The extra info isn't needed, but it hurts performance
wal_log_hints = on # forces the WAL to write out full modified pages
summarize_wal = on # another extra process for backups
track_wal_io_timing = on # more information collected
checkpoint_completion_target = 0 # prevents spreading the I/O load at all
結(jié)果:TPS 下降到 98,低于初始值的 1/70。
通過(guò)查看日志,可以確認(rèn)這是由 WAL 效率低下導(dǎo)致的:
2025-07-20 12:33:17.211 EDT [68697] LOG: checkpoint complete: wrote 19 buffers (7.4%), wrote 2 SLRU buffers; 0 WAL file(s) added, 3 removed, 0 recycled; write=0.094 s, sync=0.042 s, total=0.207 s; sync files=57, longest=0.004 s, average=0.001 s; distance=31268 kB, estimate=31268 kB; lsn=1B7/3CDC1B80, redo lsn=1B7/3C11CD48
2025-07-20 12:33:17.458 EDT [68697] LOG: checkpoints are occurring too frequently (0 seconds apart)
2025-07-20 12:33:17.458 EDT [68697] HINT: Consider increasing the configuration parameter "max_wal_size".
2025-07-20 12:33:17.494 EDT [68697] LOG: checkpoint starting: wal
2025-07-20 12:33:17.738 EDT [68697] LOG: checkpoint complete: wrote 18 buffers (7.0%), wrote 1 SLRU buffers; 0 WAL file(s) added, 2 removed, 0 recycled; write=0.089 s, sync=0.047 s, total=0.280 s; sync files=50, longest=0.009 s, average=0.001 s; distance=34287 kB, estimate=34287 kB; lsn=1B7/3F1F7B18, redo lsn=1B7/3E298BA0
2025-07-20 12:33:17.923 EDT [68697] LOG: checkpoints are occurring too frequently (0 seconds apart)
2025-07-20 12:33:17.923 EDT [68697] HINT: Consider increasing the configuration parameter "max_wal_size".
2025-07-20 12:33:17.971 EDT [68697] LOG: checkpoint starting: wal
沒錯(cuò),正常情況下 WAL 檢查點(diǎn)不該這么頻繁(間隔才 487 毫秒)……但這還是不夠。
從本質(zhì)上“刪除”索引
還記得開頭我說(shuō)過(guò)不能動(dòng)索引嗎?其實(shí),我們可以通過(guò)讓查詢計(jì)劃器“討厭索引”來(lái)達(dá)到相同目的。
Postgres 在評(píng)估查詢代價(jià)時(shí),會(huì)區(qū)分隨機(jī)訪問(wèn)和順序訪問(wèn),前者一般慢得多。索引掃描需要隨機(jī)訪問(wèn)頁(yè)面,而全表掃描通常是順序訪問(wèn)。因此,只要把隨機(jī)訪問(wèn)的代價(jià)設(shè)置得足夠高,就能讓優(yōu)化器放棄索引。
random_page_cost = 1e300 # sets the cost of accessing a random page
cpu_index_tuple_cost = 1e300 # sets the cost of processing one tuple from an index
只需修改這兩個(gè)參數(shù),幾乎就能在所有情況下禁用索引。我最終還是把 shared_buffers 提回到 8MB,避免全表掃描出錯(cuò),但這對(duì)性能顯然沒什么幫助。
如此一來(lái),TPS 下降至 0.87,已低于每秒 1 個(gè)事務(wù),比默認(rèn)配置慢了 7000 多倍——而這一切都只是改了 postgresql.conf 里的參數(shù)。
不過(guò),我還有最后一招。
強(qiáng)制 I/O 進(jìn)入單線程
雖然我不能讓 Postgres 完全單線程(每個(gè)連接仍有獨(dú)立進(jìn)程),但在 Postgres 18 中新增了一個(gè)參數(shù) io_method,可以控制線程是同步發(fā)起 I/O 系統(tǒng)調(diào)用(io_method = sync)、異步讓工作線程發(fā)起系統(tǒng)調(diào)用(io_method = worker),還是使用新的 Linux io_uring API(io_method = io_uring)。結(jié)合 io_workers(當(dāng)io_method=worker時(shí),設(shè)置工作線程的最大數(shù)量),我能強(qiáng)制所有 I/O 都走一個(gè)線程:
io_method = worker
io_workers = 1
然后,TPS 跌至 0.016,正式突破底線,性能下降超過(guò) 42000 倍!如果排除因死鎖未完成的事務(wù),情況會(huì)更“好”:100 個(gè)連接在 120 秒內(nèi),只成功完成了 11 個(gè)事務(wù)。
最終總結(jié)
折騰了幾個(gè)小時(shí),調(diào)了 32 個(gè)參數(shù)后,我成功 “搞垮” 了一個(gè) Postgres 數(shù)據(jù)庫(kù)。
誰(shuí)能想到,光改個(gè) postgresql.conf 就能把 Postgres 的性能糟踐成這樣?我本來(lái)以為最多只能把 TPS 降到個(gè)位數(shù),沒想到 Postgres 居然能被我折騰到這種地步。如果你想復(fù)現(xiàn)這個(gè)結(jié)果,以下是所有改過(guò)的非默認(rèn)參數(shù):
shared_buffers = 8MB
autovacuum_vacuum_insert_threshold = 1
autovacuum_vacuum_threshold = 0
autovacuum_vacuum_scale_factor = 0
autovacuum_vacuum_max_threshold = 1
autovacuum_naptime = 1
vacuum_cost_limit = 10000
vacuum_cost_page_dirty = 0
vacuum_cost_page_hit = 0
vacuum_cost_page_miss = 0
autovacuum_analyze_threshold = 0
autovacuum_analyze_scale_factor = 0
maintenance_work_mem = 128kB
log_autovacuum_min_duration = 0
logging_collector = on
log_destination = stderr,jsonlog
wal_writer_flush_after = 0
wal_writer_delay = 1
min_wal_size = 32MB
max_wal_size = 32MB
checkpoint_timeout = 30
checkpoint_flush_after = 1
wal_sync_method = open_datasync
wal_level = logical
wal_log_hints = on
summarize_wal = on
track_wal_io_timing = on
checkpoint_completion_target = 0
random_page_cost = 1e300
cpu_index_tuple_cost = 1e300
io_method = worker
io_workers = 1
你可以通過(guò)安裝 BenchBase 的 Postgres 版本來(lái)測(cè)試這個(gè)配置:用 TPC-C 示例配置,測(cè)試時(shí)長(zhǎng) 120 秒,預(yù)熱 120 秒,128 個(gè)倉(cāng)庫(kù),100 個(gè)連接,最大吞吐量設(shè)為 50k TPS。甚至,你還可以嘗試比我更狠,進(jìn)一步降低性能——畢竟,我只挑了那些我覺得對(duì) Postgres 性能影響最大的參數(shù),還有很多參數(shù)都沒測(cè)試過(guò)。
網(wǎng)友評(píng)論:感覺可以應(yīng)用到工組中?
這篇文章在 Reddit 上引起了不少開發(fā)者的興趣,其中不少人指出或許這可以“應(yīng)用”到工作中:
“所以,如果我沒理解錯(cuò)的話,我們可以用這種配置啟動(dòng)一個(gè)新的 Postgres 實(shí)例,然后再換成默認(rèn)配置,這樣就能聲稱我們把應(yīng)用的速度提高了 42000 倍?”
有人指出:“你不需要等到失業(yè)才這么做,我在日常工作中也經(jīng)常這樣?!?/p>
也有人感慨,這種趣味探索只能在業(yè)余時(shí)間展開:“這就是工作本身與業(yè)余樂趣之間的區(qū)別。隨便搗鼓點(diǎn)東西,然后靈機(jī)一動(dòng)‘這樣弄的話會(huì)怎樣’,這個(gè)過(guò)程總是很有趣?!?/p>
2025 全球產(chǎn)品經(jīng)理大會(huì)
8月15–16日·北京威斯汀酒店
互聯(lián)網(wǎng)大廠&AI 創(chuàng)業(yè)公司產(chǎn)品人齊聚
12 大專題,趨勢(shì)洞察 × 實(shí)戰(zhàn)拆解
掃碼領(lǐng)取大會(huì) PPT,搶占 AI 產(chǎn)品新紅利
特別聲明:以上內(nèi)容(如有圖片或視頻亦包括在內(nèi))為自媒體平臺(tái)“網(wǎng)易號(hào)”用戶上傳并發(fā)布,本平臺(tái)僅提供信息存儲(chǔ)服務(wù)。
Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.