PostgreSQL 16 服务器配置与数据库监控终极指南 —语法、案例与实战
✅ 一、服务器配置概述
PostgreSQL 16 的服务器配置主要通过 配置文件 和 SQL 命令 控制,涵盖连接、资源、日志、查询优化、统计收集等核心模块。
📁 主要配置文件:
postgresql.conf:主配置文件(全局参数)pg_hba.conf:客户端认证配置(Host-Based Authentication)pg_ident.conf:用户映射配置(可选)
⚙️ 配置方式优先级:
- 会话级设置:
SET LOCAL ...(事务内生效)- 会话级设置:
SET ...(当前连接生效)- 数据库级设置:
ALTER DATABASE ... SET ...- 用户级设置:
ALTER ROLE ... SET ...- postgresql.conf(全局默认)
✅ 二、服务器配置文件详解
2.1 postgresql.conf —— 主配置文件
📌 位置:通常位于
$PGDATA/postgresql.conf
📌 语法:
1# 注释 2parameter = value # 字符串值加引号 3parameter = 'value' # 数值、布尔值可不加 4parameter = 123 5parameter = on/off # 布尔值 6
✅ 案例1:查看和修改配置
1# ✅ 查看当前配置文件路径 2psql -U postgres -c "SHOW config_file;" 3 4# 输出示例: 5# config_file 6# --------------------------------------- 7# /var/lib/pgsql/16/data/postgresql.conf 8 9# ✅ 查看某参数当前值 10psql -U postgres -c "SHOW shared_buffers;" 11 12# ✅ 在SQL中临时修改(会话级) 13SET work_mem = '64MB'; 14SHOW work_mem; 15 16# ✅ 永久修改(需编辑 postgresql.conf) 17# 编辑文件: 18nano /var/lib/pgsql/16/data/postgresql.conf 19 20# 修改后重载(无需重启): 21pg_ctl reload -D /var/lib/pgsql/16/data 22 23# 或使用SQL: 24SELECT pg_reload_conf(); -- 返回 true 表示成功 25
✅ 常用查看命令:
1SHOW ALL; -- 查看所有参数 2SHOW shared_buffers; -- 查看特定参数 3SELECT name, setting, unit FROM pg_settings WHERE name LIKE '%work%'; 4
✅ 三、连接与认证配置(pg_hba.conf)
📌 功能:控制哪些主机、用户、数据库可以连接,以及认证方式。
📌 语法:
1# TYPE DATABASE USER ADDRESS METHOD [OPTIONS] 2host all all 192.168.1.0/24 md5 3local mydb myuser peer 4hostssl all all 0.0.0.0/0 scram-sha-256 5
✅ 案例2:配置远程连接与认证
1# ✅ 编辑 pg_hba.conf 2nano /var/lib/pgsql/16/data/pg_hba.conf 3 4# 添加以下行(允许192.168.1网段MD5密码登录) 5host all all 192.168.1.0/24 md5 6 7# 允许本地socket使用peer认证(操作系统用户映射) 8local all all peer 9 10# 允许所有IP通过SSL+SCRAM-SHA-256登录(生产推荐) 11hostssl all all 0.0.0.0/0 scram-sha-256 12 13# ✅ 修改 postgresql.conf 允许监听所有IP 14listen_addresses = '*' # 默认是 'localhost' 15 16# ✅ 重启服务使配置生效 17sudo systemctl restart postgresql-16 18 19# ✅ 测试远程连接(从192.168.1.100) 20psql -h 192.168.1.50 -U myuser -d mydb 21
✅ 认证方式说明:
peer:本地socket,使用操作系统用户名md5:密码MD5加密(不推荐,易受中间人攻击)scram-sha-256:PostgreSQL 10+ 推荐,更安全的密码认证cert:SSL证书认证reject:拒绝连接
✅ 四、资源消耗配置
📌 关键参数:控制内存、CPU、并发资源使用。
✅ 案例3:配置内存与并发
1# postgresql.conf 2 3# 共享内存缓冲区(建议物理内存25%) 4shared_buffers = 4GB 5 6# 临时缓冲区(每个会话的临时表使用) 7temp_buffers = 32MB 8 9# 每个操作(排序、哈希)可用内存 10work_mem = 64MB # 注意:每个操作独立,高并发时总内存 = work_mem * 并发操作数 11 12# 维护操作内存(VACUUM, CREATE INDEX) 13maintenance_work_mem = 2GB 14 15# 自动清理内存 16autovacuum_work_mem = -1 # -1 表示使用 maintenance_work_mem 17 18# 最大连接数 19max_connections = 200 20 21# 超级用户保留连接 22superuser_reserved_connections = 3 23 24# 空闲事务超时(防连接泄漏) 25idle_in_transaction_session_timeout = 10min 26
✅ 动态查看资源使用:
1-- 查看当前连接数 2SELECT count(*) FROM pg_stat_activity; 3 4-- 查看活跃连接 5SELECT pid, usename, application_name, state, query 6FROM pg_stat_activity 7WHERE state = 'active'; 8 9-- 查看等待的锁 10SELECT * FROM pg_locks WHERE granted = false; 11
✅ 五、预写式日志(WAL)配置
📌 WAL(Write-Ahead Logging):保证数据一致性和崩溃恢复的核心机制。
✅ 案例4:配置 WAL 参数
1# postgresql.conf 2 3# WAL 日志级别(replica 支持流复制) 4wal_level = replica 5 6# 同步提交(on=强一致,off=高性能,remote_write=折中) 7synchronous_commit = on 8 9# WAL 缓冲区大小(默认 -1 = shared_buffers 的 1/32) 10wal_buffers = 16MB 11 12# 检查点间隔(时间 or WAL 大小触发) 13checkpoint_timeout = 30min 14max_wal_size = 4GB 15min_wal_size = 1GB 16 17# 归档模式(用于PITR备份) 18archive_mode = on 19archive_command = 'cp %p /backup/wal/%f' # Linux 示例 20# archive_command = 'copy "%p" "C:\\backup\\wal\\%f"' # Windows 示例 21
✅ 监控 WAL 状态:
1-- 查看 WAL 写入统计 2SELECT * FROM pg_stat_bgwriter; 3 4-- 查看当前 WAL LSN(日志序列号) 5SELECT pg_current_wal_lsn(); 6 7-- 查看 WAL 文件大小(需超级用户) 8SELECT pg_size_pretty(pg_current_wal_insert_lsn() - '0/00000000'::pg_lsn); 9
✅ 六、查询规划配置
📌 控制优化器行为,影响执行计划选择。
✅ 案例5:配置查询优化器
1# postgresql.conf 2 3# 统计信息目标(越高越精确,但 ANALYZE 越慢) 4default_statistics_target = 100 5 6# 随机页成本(SSD 环境降低) 7random_page_cost = 1.1 # HDD 默认4.0,SSD 建议1.1 8 9# 顺序页成本 10seq_page_cost = 1.0 11 12# CPU 成本 13cpu_tuple_cost = 0.01 14cpu_index_tuple_cost = 0.005 15cpu_operator_cost = 0.0025 16 17# 并行查询 18max_parallel_workers_per_gather = 4 19parallel_setup_cost = 1000 20parallel_tuple_cost = 0.1 21 22# JIT 编译(PostgreSQL 11+) 23jit = on 24jit_above_cost = 100000 25
✅ 强制使用索引示例:
1-- 如果优化器错误选择全表扫描,可临时调整成本 2SET enable_seqscan = off; -- 强制不使用顺序扫描(调试用,生产慎用!) 3EXPLAIN SELECT * FROM employees WHERE salary > 80000; 4SET enable_seqscan = on; -- 恢复默认 5
✅ 七、错误报告和日志配置
📌 PostgreSQL 16 增强日志功能,便于监控和调试。
✅ 案例6:配置详细日志
1# postgresql.conf 2 3# 日志目的地 4log_destination = 'stderr' 5 6# 启用日志收集器 7logging_collector = on 8 9# 日志目录和文件名 10log_directory = 'log' 11log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' 12 13# 日志轮转 14log_rotation_age = 1d 15log_rotation_size = 100MB 16 17# 记录慢查询(>=1秒) 18log_min_duration_statement = 1000 19 20# 记录所有 DDL 21log_statement = 'ddl' # none, ddl, mod, all 22 23# 记录连接/断开 24log_connections = on 25log_disconnections = on 26 27# 记录锁等待 28log_lock_waits = on 29 30# PostgreSQL 16 新增:记录错误参数 31log_parameter_max_length_on_error = 1024 32 33# 记录临时文件(>=0字节) 34log_temp_files = 0 35 36# 记录检查点 37log_checkpoints = on 38
✅ 查看日志:
1# 实时查看日志 2tail -f /var/lib/pgsql/16/data/log/postgresql-*.log 3 4# 查找慢查询 5grep "duration:" /var/lib/pgsql/16/data/log/postgresql-*.log | grep " > 1000ms" 6
✅ 在SQL中查看日志相关参数:
1SHOW log_min_duration_statement; 2SHOW log_statement; 3
✅ 八、运行时统计配置
📌 统计收集器:收集数据库活动信息,用于性能分析。
✅ 案例7:启用统计收集
1# postgresql.conf 2 3# 启用统计收集器 4track_activities = on 5track_counts = on 6track_io_timing = on # I/O 时间统计(需 pg_stat_statements) 7track_functions = all # pl, all, none 8 9# 统计信息保留时间 10stats_temp_directory = 'pg_stat_tmp' # 建议放在内存盘(如 /dev/shm/pg_stat_tmp) 11
✅ 查看统计信息:
1-- 查看数据库统计 2SELECT datname, numbackends, xact_commit, xact_rollback, blks_read, blks_hit 3FROM pg_stat_database; 4 5-- 查看表统计 6SELECT schemaname, tablename, seq_scan, idx_scan, n_tup_ins, n_tup_upd, n_tup_del 7FROM pg_stat_user_tables; 8 9-- 查看索引使用情况 10SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch 11FROM pg_stat_user_indexes; 12
✅ 九、自动清理(Autovacuum)配置
📌 自动清理:回收死元组空间,更新统计信息,防止表膨胀。
✅ 案例8:配置自动清理
1# postgresql.conf 2 3# 启用自动清理 4autovacuum = on 5 6# 自动分析(更新统计信息) 7autoanalyze = on 8 9# 触发自动清理的阈值 10autovacuum_vacuum_threshold = 50 11autovacuum_vacuum_scale_factor = 0.2 # 表变化20%触发 12 13# 触发自动分析的阈值 14autovacuum_analyze_threshold = 50 15autovacuum_analyze_scale_factor = 0.1 # 表变化10%触发 16 17# 并行自动清理 18autovacuum_max_workers = 6 19autovacuum_naptime = 1min 20autovacuum_vacuum_cost_limit = 2000 21
✅ 手动触发清理:
1-- 清理并分析表 2VACUUM ANALYZE employees; 3 4-- 仅清理(不锁表) 5VACUUM employees; 6 7-- 强制清理(锁表,重建索引) 8VACUUM FULL employees; 9 10-- 仅分析(更新统计信息) 11ANALYZE employees; 12
✅ 监控自动清理状态:
1SELECT schemaname, tablename, last_vacuum, last_autovacuum, last_analyze, last_autoanalyze 2FROM pg_stat_user_tables 3WHERE tablename = 'employees'; 4
✅ 十、客户端连接默认配置
📌 控制新连接的默认行为。
✅ 案例9:设置客户端默认值
1# postgresql.conf 2 3# 默认事务隔离级别 4default_transaction_isolation = 'read committed' 5 6# 默认事务只读 7default_transaction_read_only = off 8 9# 默认时区 10timezone = 'UTC' 11 12# 默认字符编码 13client_encoding = 'UTF8' 14 15# 默认日期风格 16datestyle = 'ISO, MDY' 17 18# 默认搜索路径 19search_path = '"$user", public' 20
✅ 在SQL中设置用户/数据库级默认值:
1-- 为用户设置默认 search_path 2ALTER ROLE myuser SET search_path = 'myschema, public'; 3 4-- 为数据库设置默认 timezone 5ALTER DATABASE mydb SET timezone = 'Asia/Shanghai'; 6 7-- 查看用户设置 8SELECT rolname, rolconfig FROM pg_roles WHERE rolname = 'myuser'; 9
✅ 十一、锁管理配置
📌 控制锁等待行为和死锁检测。
✅ 案例10:配置锁参数
1# postgresql.conf 2 3# 死锁检测时间(毫秒) 4deadlock_timeout = 1000 5 6# 锁等待超时(语句级) 7lock_timeout = 0 # 0=无限等待,建议生产设为 5000(5秒) 8 9# 语句超时 10statement_timeout = 30s # 30秒超时(防慢查询拖垮系统) 11 12# 空闲事务超时 13idle_in_transaction_session_timeout = 10min 14
✅ 监控锁等待:
1-- 查看当前锁 2SELECT 3locktype, 4relation::regclass, 5mode, 6granted, 7pid, 8query 9FROM pg_locks l 10JOIN pg_stat_activity a ON l.pid = a.pid 11WHERE NOT granted; -- 未授予的锁(等待中) 12 13-- 查看阻塞链 14SELECT 15blocked_locks.pid AS blocked_pid, 16blocked_activity.query AS blocked_query, 17blocking_locks.pid AS blocking_pid, 18blocking_activity.query AS blocking_query 19FROM pg_catalog.pg_locks blocked_locks 20JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid 21JOIN pg_catalog.pg_locks blocking_locks 22ON blocking_locks.locktype = blocked_locks.locktype 23AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database 24AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation 25AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page 26AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple 27AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid 28AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid 29AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid 30AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid 31AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid 32AND blocking_locks.pid != blocked_locks.pid 33JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid 34WHERE NOT blocked_locks.granted; 35
✅ 十二、版本和平台兼容性配置
📌 确保跨版本/平台兼容性。
✅ 案例11:配置兼容性参数
1# postgresql.conf 2 3# SQL 标准兼容性 4sql_standard_conforming_strings = on 5 6# 旧版本兼容性(如8.3行为) 7array_nulls = on 8 9# 日期时间兼容性 10DateStyle = 'ISO, MDY' 11 12# 字符串函数兼容性 13standard_conforming_strings = on 14 15# 整数除法行为 16div_precision_increment = 0 17
✅ 查看版本信息:
1SELECT version(); 2-- PostgreSQL 16.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.4.1 20230605, 64-bit 3 4SHOW server_version; 5SHOW server_version_num; 6
✅ 十三、监控数据库的活动
📌 实时监控连接、查询、锁、资源使用。
✅ 案例12:监控活跃会话
1-- ✅ 查看所有活动会话 2SELECT 3 pid, 4 usename, 5 application_name, 6 client_addr, 7 backend_start, 8 state, 9 state_change, 10 query, 11 wait_event_type, 12 wait_event 13FROM pg_stat_activity 14WHERE state != 'idle' 15ORDER BY state_change DESC; 16 17-- ✅ 查看长时间运行的查询(>5分钟) 18SELECT 19 pid, 20 now() - query_start as duration, 21 usename, 22 query 23FROM pg_stat_activity 24WHERE state = 'active' 25 AND now() - query_start > interval '5 minutes' 26ORDER BY duration DESC; 27 28-- ✅ 查看空闲事务(可能泄漏) 29SELECT 30 pid, 31 now() - state_change as idle_duration, 32 usename, 33 query 34FROM pg_stat_activity 35WHERE state = 'idle in transaction' 36ORDER BY idle_duration DESC; 37 38-- ✅ 终止慢查询(需超级用户) 39SELECT pg_terminate_backend(pid) 40FROM pg_stat_activity 41WHERE now() - query_start > interval '10 minutes' 42 AND state = 'active'; 43
✅ 十四、配置统计收集器
📌 启用详细统计收集(需重启)。
✅ 案例13:启用 pg_stat_statements(最常用扩展)
1-- ✅ 安装扩展(需在 postgresql.conf 中配置 shared_preload_libraries) 2-- 先修改 postgresql.conf: 3-- shared_preload_libraries = 'pg_stat_statements' 4 5-- 重启 PostgreSQL 6sudo systemctl restart postgresql-16 7 8-- ✅ 创建扩展 9CREATE EXTENSION pg_stat_statements; 10 11-- ✅ 查看最耗时的SQL(按总时间排序) 12SELECT 13 query, 14 calls, 15 total_exec_time, 16 mean_exec_time, 17 rows, 18 shared_blks_hit, 19 shared_blks_read 20FROM pg_stat_statements 21ORDER BY total_exec_time DESC 22LIMIT 10; 23 24-- ✅ 重置统计信息 25SELECT pg_stat_statements_reset(); 26
✅ postgresql.conf 配置:
1shared_preload_libraries = 'pg_stat_statements' 2pg_stat_statements.track = all # top, all, none 3pg_stat_statements.max = 10000 # 最多跟踪的SQL数 4pg_stat_statements.save = on # 重启后保留统计 5
✅ 十五、查看收集到的统计信息
📌 系统视图:
pg_stat_*,pg_statio_*
✅ 案例14:全面统计分析
1-- ✅ 数据库级统计 2SELECT 3 datname, 4 numbackends AS connections, 5 xact_commit, 6 xact_rollback, 7 blks_read, 8 blks_hit, 9 pg_size_pretty(blks_read * 8192) AS read_bytes, 10 pg_size_pretty(blks_hit * 8192) AS hit_bytes, 11 (blks_hit * 100.0 / (blks_hit + blks_read)) AS hit_ratio 12FROM pg_stat_database 13WHERE datname NOT IN ('template0', 'template1'); 14 15-- ✅ 表级IO统计 16SELECT 17 schemaname, 18 tablename, 19 heap_blks_read, 20 heap_blks_hit, 21 idx_blks_read, 22 idx_blks_hit, 23 toast_blks_read, 24 toast_blks_hit, 25 tidx_blks_read, 26 tidx_blks_hit 27FROM pg_statio_user_tables 28ORDER BY heap_blks_read DESC; 29 30-- ✅ 索引使用率(识别无用索引) 31SELECT 32 schemaname, 33 tablename, 34 indexname, 35 idx_scan AS scans, 36 pg_size_pretty(pg_relation_size(indexrelid)) AS index_size 37FROM pg_stat_user_indexes 38WHERE idx_scan < 50 -- 使用次数少于50次 39ORDER BY pg_relation_size(indexrelid) DESC; 40 41-- ✅ 查询最频繁的表 42SELECT 43 schemaname, 44 tablename, 45 seq_scan + idx_scan AS total_scans, 46 n_tup_ins AS inserts, 47 n_tup_upd AS updates, 48 n_tup_del AS deletes 49FROM pg_stat_user_tables 50ORDER BY total_scans DESC 51LIMIT 10; 52
✅ 十六、监控磁盘的使用
📌 监控表空间、数据库、表、索引的磁盘占用。
✅ 案例15:监控磁盘使用量
1-- ✅ 查看数据库大小 2SELECT 3 datname, 4 pg_size_pretty(pg_database_size(datname)) AS size 5FROM pg_database 6ORDER BY pg_database_size(datname) DESC; 7 8-- ✅ 查看表空间大小 9SELECT 10 spcname, 11 pg_size_pretty(pg_tablespace_size(oid)) AS size 12FROM pg_tablespace; 13 14-- ✅ 查看当前数据库中所有表大小(含索引) 15SELECT 16 schemaname, 17 tablename, 18 pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size, 19 pg_size_pretty(pg_relation_size(schemaname || '.' || tablename)) AS table_size, 20 pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename) - pg_relation_size(schemaname || '.' || tablename)) AS index_size 21FROM pg_stat_user_tables 22ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC; 23 24-- ✅ 查看单个表详细大小 25SELECT 26 'table' AS type, 27 pg_size_pretty(pg_relation_size('employees')) AS size 28UNION ALL 29SELECT 30 'indexes', 31 pg_size_pretty(pg_indexes_size('employees')) 32UNION ALL 33SELECT 34 'toast', 35 pg_size_pretty(pg_total_relation_size('employees') - pg_relation_size('employees') - pg_indexes_size('employees')) 36UNION ALL 37SELECT 38 'total', 39 pg_size_pretty(pg_total_relation_size('employees')); 40 41-- ✅ 查看最大表(整个实例) 42SELECT 43 schemaname, 44 tablename, 45 pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size 46FROM pg_stat_user_tables 47ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC 48LIMIT 10; 49
✅ 十七、磁盘满导致的失效
📌 磁盘满会导致:
- 无法写入WAL → 数据库崩溃
- 无法创建临时文件 → 查询失败
- 无法写入日志 → 无法诊断问题
✅ 案例16:模拟和处理磁盘满
1-- ✅ 检查磁盘空间(SQL内无法直接查,需结合OS命令) 2-- 在操作系统执行: 3df -h /var/lib/pgsql/16/data 4 5-- ✅ 如果磁盘满,紧急处理: 6-- 1. 清理 pg_log(如果日志在数据目录) 7rm /var/lib/pgsql/16/data/log/postgresql-*.log 8 9-- 2. 清理 pg_wal/archive_status(如果归档卡住) 10rm /var/lib/pgsql/16/data/pg_wal/archive_status/*.ready 11 12-- 3. 扩大磁盘或迁移表空间(长期方案) 13 14-- ✅ 在SQL中监控表膨胀(可能导致磁盘满) 15SELECT 16 schemaname, 17 tablename, 18 pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size, 19 n_dead_tup, 20 n_live_tup, 21 (n_dead_tup * 100.0 / (n_live_tup + n_dead_tup + 1)) AS dead_ratio 22FROM pg_stat_user_tables 23WHERE n_dead_tup > 1000 24ORDER BY n_dead_tup DESC; 25 26-- ✅ 清理膨胀表 27VACUUM FULL VERBOSE ANALYZE employees; -- 锁表,慎用! 28-- 或使用 pg_repack(在线清理,需安装扩展) 29
✅ 预防措施:
- 设置监控告警(磁盘>80%告警)
- 定期
VACUUM和REINDEX- 使用
pg_cron定期清理日志- 分离 WAL、日志、数据到不同磁盘
✅ 十八、综合实战案例
🎯 案例1:一键生成数据库健康报告
1-- 创建健康检查函数 2CREATE OR REPLACE FUNCTION get_db_health_report() 3RETURNS TABLE( 4 category TEXT, 5 item TEXT, 6 value TEXT, 7 recommendation TEXT 8) AS $$ 9BEGIN 10 -- 连接数检查 11 RETURN QUERY 12 SELECT 13 'Connections'::TEXT, 14 'Active Connections'::TEXT, 15 count(*)::TEXT, 16 CASE 17 WHEN count(*) > (SELECT setting::INT * 0.8 FROM pg_settings WHERE name = 'max_connections') 18 THEN '接近最大连接数,考虑增加 max_connections 或优化应用' 19 ELSE '正常' 20 END 21 FROM pg_stat_activity WHERE state = 'active'; 22 23 -- 锁等待检查 24 RETURN QUERY 25 SELECT 26 'Locks'::TEXT, 27 'Blocking Sessions'::TEXT, 28 count(*)::TEXT, 29 CASE 30 WHEN count(*) > 0 THEN '存在阻塞会话,需排查' 31 ELSE '正常' 32 END 33 FROM pg_locks WHERE NOT granted; 34 35 -- 表膨胀检查 36 RETURN QUERY 37 SELECT 38 'Bloat'::TEXT, 39 tablename::TEXT, 40 pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename))::TEXT, 41 '建议执行 VACUUM FULL 或使用 pg_repack' 42 FROM pg_stat_user_tables 43 WHERE n_dead_tup > 10000 44 AND (n_dead_tup * 100.0 / (n_live_tup + n_dead_tup + 1)) > 20 45 ORDER BY n_dead_tup DESC 46 LIMIT 5; 47 48 -- 无用索引检查 49 RETURN QUERY 50 SELECT 51 'Indexes'::TEXT, 52 indexname::TEXT, 53 '使用次数: ' || idx_scan::TEXT, 54 '考虑删除此索引以节省空间和写性能' 55 FROM pg_stat_user_indexes 56 WHERE idx_scan < 50 57 ORDER BY pg_relation_size(indexrelid) DESC 58 LIMIT 5; 59 60 -- 磁盘空间检查(需超级用户) 61 RETURN QUERY 62 SELECT 63 'Disk'::TEXT, 64 'Database Size'::TEXT, 65 pg_size_pretty(pg_database_size(current_database()))::TEXT, 66 CASE 67 WHEN pg_database_size(current_database()) > 100*1024^3 THEN '数据库超过100GB,考虑分区或归档' 68 ELSE '正常' 69 END; 70END; 71$$ LANGUAGE plpgsql; 72 73-- ✅ 执行健康检查 74SELECT * FROM get_db_health_report(); 75
🎯 案例2:自动化磁盘监控脚本
1#!/bin/bash 2# disk_monitor.sh 3 4PGDATA="/var/lib/pgsql/16/data" 5THRESHOLD=80 # 80% 告警 6ADMIN_EMAIL="admin@company.com" 7 8# 获取磁盘使用率 9USAGE=$(df $PGDATA | tail -1 | awk '{print $5}' | sed 's/%//') 10 11echo "PostgreSQL 数据目录磁盘使用率: ${USAGE}%" 12 13if [ $USAGE -gt $THRESHOLD ]; then 14 echo "⚠️ 警告:磁盘使用率超过 ${THRESHOLD}%!" >&2 15 16 # 发送邮件告警 17 echo "PostgreSQL 磁盘空间不足!当前使用率: ${USAGE}%" | \ 18 mail -s "【紧急】PostgreSQL 磁盘告警" $ADMIN_EMAIL 19 20 # 记录到日志 21 echo "$(date): 磁盘使用率 ${USAGE}%,超过阈值 ${THRESHOLD}%" >> /var/log/pg_disk_alert.log 22 23 # 尝试清理(示例) 24 echo "正在清理旧日志..." 25 find $PGDATA/log -name "postgresql-*.log" -mtime +7 -delete 26 27 # 检查大表 28 psql -U postgres -c " 29 SELECT 30 schemaname || '.' || tablename as table_name, 31 pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) as size 32 FROM pg_stat_user_tables 33 ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC 34 LIMIT 5; 35 " 36else 37 echo "✅ 磁盘使用率正常" 38fi 39
✅ 设置定时任务:
1# 每小时检查一次 2crontab -e 30 * * * * /path/to/disk_monitor.sh 4
🎯 案例3:完整 PostgreSQL 16 生产配置
1# 🚀 postgresql.conf - PostgreSQL 16 生产环境最佳配置 2 3# 连接 4listen_addresses = '*' 5port = 5432 6max_connections = 300 7superuser_reserved_connections = 5 8idle_in_transaction_session_timeout = 10min 9 10# 认证(pg_hba.conf 配置) 11# hostssl all all 0.0.0.0/0 scram-sha-256 12 13# 内存 14shared_buffers = 8GB # 25% of 32GB RAM 15effective_cache_size = 24GB # 75% of 32GB RAM 16work_mem = 64MB # 300 connections * 64MB = 19.2GB (max) 17maintenance_work_mem = 2GB 18autovacuum_work_mem = -1 19 20# WAL 21wal_level = replica 22synchronous_commit = on 23wal_buffers = 16MB 24checkpoint_timeout = 30min 25max_wal_size = 4GB 26min_wal_size = 1GB 27archive_mode = on 28archive_command = 'test ! -f /backup/wal/%f && cp %p /backup/wal/%f' 29 30# 查询优化 31random_page_cost = 1.1 # SSD 32effective_io_concurrency = 200 33default_statistics_target = 500 34max_parallel_workers_per_gather = 4 35jit = on 36 37# 日志 38log_destination = 'stderr' 39logging_collector = on 40log_directory = 'log' 41log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' 42log_rotation_age = 1d 43log_min_duration_statement = 1000 # 1秒慢查询 44log_checkpoints = on 45log_connections = on 46log_disconnections = on 47log_lock_waits = on 48log_temp_files = 0 49log_parameter_max_length_on_error = 1024 # PG16 50 51# 统计 52track_activities = on 53track_counts = on 54track_io_timing = on 55stats_temp_directory = '/dev/shm/pg_stat_tmp' # RAM disk 56 57# 自动清理 58autovacuum = on 59autovacuum_max_workers = 6 60autovacuum_naptime = 1min 61autovacuum_vacuum_scale_factor = 0.1 62autovacuum_analyze_scale_factor = 0.05 63 64# 锁与超时 65deadlock_timeout = 1s 66lock_timeout = 5s 67statement_timeout = 30s 68 69# 扩展 70shared_preload_libraries = 'pg_stat_statements,pg_cron' 71pg_stat_statements.track = all 72pg_stat_statements.max = 10000 73pg_stat_statements.save = on 74
✅ 十九、常见问题及解答(FAQ)
❓ 疑问1:当服务器配置出现冲突时,采用什么方式的优先级?
答案:PostgreSQL 配置遵循 “就近原则”,优先级从高到低:
- 会话内设置(
SET LOCAL)→ 仅当前事务有效- 会话设置(
SET)→ 当前连接有效- 数据库设置(
ALTER DATABASE ... SET ...)- 用户设置(
ALTER ROLE ... SET ...)- postgresql.conf → 全局默认值
✅ 案例演示优先级:
1-- ✅ 1. 查看全局默认值 2SHOW work_mem; -- 假设 postgresql.conf 中设置为 '4MB' 3 4-- ✅ 2. 用户级设置(需超级用户) 5ALTER ROLE myuser SET work_mem = '32MB'; 6 7-- 重新连接后: 8SHOW work_mem; -- 显示 '32MB' 9 10-- ✅ 3. 数据库级设置(覆盖用户设置) 11ALTER DATABASE mydb SET work_mem = '64MB'; 12 13-- 重新连接 mydb 后: 14SHOW work_mem; -- 显示 '64MB' 15 16-- ✅ 4. 会话级设置(覆盖所有) 17SET work_mem = '128MB'; 18SHOW work_mem; -- 显示 '128MB' 19 20-- ✅ 5. 事务级设置(最高优先级) 21BEGIN; 22SET LOCAL work_mem = '256MB'; 23SHOW work_mem; -- 显示 '256MB' 24COMMIT; 25SHOW work_mem; -- 恢复为 '128MB' 26 27-- ✅ 6. 重置为默认 28RESET work_mem; 29SHOW work_mem; -- 恢复为数据库级 '64MB' 30
✅ 查看参数来源:
1SELECT name, setting, source 2FROM pg_settings 3WHERE name = 'work_mem'; 4-- source 列显示:configuration file, database, user, session, override 等 5
📌 黄金法则:
越具体的设置优先级越高!事务 > 会话 > 数据库 > 用户 > 全局
❓ 疑问2:为什么有时候磁盘没有写满性能仍然很低?
答案:磁盘性能瓶颈 ≠ 磁盘空间不足!常见原因:
- I/O 等待高:磁盘响应慢(HDD、SSD老化、RAID降级)
- 缓冲区命中率低:
shared_buffers太小,频繁读磁盘- 表膨胀:死元组占用空间,扫描效率低
- 索引失效:统计信息过期,优化器选错计划
- 锁竞争:大量行锁/表锁导致等待
- WAL 写入瓶颈:
synchronous_commit=on且磁盘慢
✅ 案例:诊断性能问题
1-- ✅ 1. 检查缓冲区命中率(应>90%) 2SELECT 3 sum(blks_hit) * 100.0 / (sum(blks_hit) + sum(blks_read)) AS hit_ratio 4FROM pg_stat_database; 5 6-- ✅ 2. 检查表膨胀 7SELECT 8 schemaname, 9 tablename, 10 n_dead_tup, 11 (n_dead_tup * 100.0 / (n_live_tup + n_dead_tup + 1)) AS dead_ratio 12FROM pg_stat_user_tables 13WHERE n_dead_tup > 1000 14ORDER BY dead_ratio DESC; 15 16-- ✅ 3. 检查I/O等待(需 track_io_timing=on) 17EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM large_table WHERE condition; 18-- 查看 "I/O Timings: read=xxx" 是否很高 19 20-- ✅ 4. 检查锁等待 21SELECT * FROM pg_locks WHERE NOT granted; 22 23-- ✅ 5. 检查慢查询 24SELECT 25 query, 26 total_time, 27 calls, 28 mean_time 29FROM pg_stat_statements 30ORDER BY mean_time DESC 31LIMIT 10; 32 33-- ✅ 6. 检查WAL写入(后台写入器统计) 34SELECT 35 checkpoints_timed, 36 checkpoints_req, 37 checkpoint_write_time, 38 checkpoint_sync_time 39FROM pg_stat_bgwriter; 40-- checkpoint_sync_time 高表示磁盘同步慢 41
✅ 解决方案:
- I/O慢:升级到NVMe SSD,调整
random_page_cost=1.1- 缓冲区命中低:增大
shared_buffers,预热关键表- 表膨胀:定期
VACUUM,调整autovacuum参数- 索引失效:定期
ANALYZE,调整default_statistics_target- 锁竞争:优化事务,减少长事务,设置
lock_timeout- WAL瓶颈:考虑
synchronous_commit=off(可接受少量丢失),或使用更快磁盘
📌 关键指标:
- 缓冲区命中率 > 95%
- 表膨胀率 < 10%
- 平均查询时间 < 100ms
- 锁等待次数 = 0
✅ 二十、监控最佳实践总结
- 配置先行:
- 合理设置
postgresql.conf和pg_hba.conf - 启用
pg_stat_statements和详细日志
- 合理设置
- 实时监控:
- 使用
pg_stat_activity监控活跃会话 - 使用
pg_locks监控锁等待 - 设置
statement_timeout防止慢查询拖垮系统
- 使用
- 定期检查:
- 每日检查磁盘空间
- 每周检查表膨胀和索引使用率
- 每月审查慢查询日志
- 自动化告警:
- 磁盘空间 > 80% 告警
- 连接数 > 80% 告警
- 锁等待 > 30秒 告警
- PostgreSQL 16 新特性:
- 利用
log_parameter_max_length_on_error增强错误诊断 - 使用更详细的统计信息优化查询
- 利用
🚀 终极口诀:
配置要合理,监控要实时,告警要及时,优化要持续!
📚 建议结合 Prometheus + Grafana + pg_exporter 构建可视化监控大盘!
《PostgreSQL 入门学习教程,从入门到精通,PostgreSQL 16 服务器配置与数据库监控终极指南 —语法、案例与实战(18)》 是转载文章,点击查看原文。
