(资料图)
从原理上来看,MVCC需要给定事务ID后,能查询到事务的状态。
在PG中事务状态可以从几个路径获取:
在快照中查询(活跃事务)在元组头的状态为查询(不活跃事务)在CLOG中查询(不活跃事务)如果不看实现只看概念,不活跃事务提交状态也可以在XLOG中查询,CLOG可以视作一种XLOG commit/rollback日志的缓存、映射,一种事务提交状态的快速查询方式。
所以在write-WAL-before-data中,CLOG也会按照data来处理,只有XLOG属于WAL。
postgresql中clog使用SLRU机制读写,在Slru写盘前,会有保证xlog先写的机制:
group_lsn表示32个事务一组中最大的日志序列号(LSN)。group_lsn主要用于事务提交非同步落盘的场景。static boolSlruPhysicalWritePage(SlruCtl ctl, int pageno, int slotno, SlruWriteAll fdata){...if (shared->group_lsn != NULL){/* * We must determine the largest async-commit LSN for the page. This * is a bit tedious, but since this entire function is a slow path * anyway, it seems better to do this here than to maintain a per-page * LSN variable (which"d need an extra comparison in the * transaction-commit path). */XLogRecPtrmax_lsn;intlsnindex,lsnoff;lsnindex = slotno * shared->lsn_groups_per_page;max_lsn = shared->group_lsn[lsnindex++];for (lsnoff = 1; lsnoff < shared->lsn_groups_per_page; lsnoff++){XLogRecPtrthis_lsn = shared->group_lsn[lsnindex++];if (max_lsn < this_lsn)max_lsn = this_lsn; <<<<<<<<<<<<<<<<<<<<<<<<< 找到最大的LSN}if (!XLogRecPtrIsInvalid(max_lsn)){/* * As noted above, elog(ERROR) is not acceptable here, so if * XLogFlush were to fail, we must PANIC. This isn"t much of a * restriction because XLogFlush is just about all critical * section anyway, but let"s make sure. */START_CRIT_SECTION();XLogFlush(max_lsn); <<<<<<<<<<<<<<<<<<<<<<<<< 先保证XLOG写到这个位点!END_CRIT_SECTION();}} ... if (pg_pwrite(fd, shared->page_buffer[slotno], BLCKSZ, offset) != BLCKSZ) { ... }}
数据页面同理,也是先找到页面lsn,刷xlog,在写数据。
static voidFlushBuffer(BufferDesc *buf, SMgrRelation reln){...buf_state = LockBufHdr(buf);/* * Run PageGetLSN while holding header lock, since we don"t have the * buffer locked exclusively in all cases. */recptr = BufferGetLSN(buf); <<<<<<<<<<<<<<<<<<<<<<<<< 找到页面的LSN/* To check if block content changes while flushing. - vadim 01/17/97 */buf_state &= ~BM_JUST_DIRTIED;UnlockBufHdr(buf, buf_state);/* * Force XLOG flush up to buffer"s LSN. This implements the basic WAL * rule that log updates must hit disk before any of the data-file changes * they describe do. * * However, this rule does not apply to unlogged relations, which will be * lost after a crash anyway. Most unlogged relation pages do not bear * LSNs since we never emit WAL records for them, and therefore flushing * up through the buffer LSN would be useless, but harmless. However, * GiST indexes use LSNs internally to track page-splits, and therefore * unlogged GiST pages bear "fake" LSNs generated by * GetFakeLSNForUnloggedRel. It is unlikely but possible that the fake * LSN counter could advance past the WAL insertion point; and if it did * happen, attempting to flush WAL through that location would fail, with * disastrous system-wide consequences. To make sure that can"t happen, * skip the flush if the buffer isn"t permanent. */if (buf_state & BM_PERMANENT)XLogFlush(recptr); <<<<<<<<<<<<<<<<<<<<<<<<< 先保证XLOG写到这个位点! ...smgrwrite(reln, BufTagGetForkNum(&buf->tag), buf->tag.blockNum, bufToWrite, false); ...}