Chanler

Chanler

"Dark Horse MySQL" II. InnoDB Engine

Logical Storage Structure#

image.png|500

In the InnoDB storage engine, each row of data contains not only the values of all fields defined in the table but also several important components:

  1. 6-byte transaction ID trx id
    In InnoDB, each transaction is assigned a unique transaction ID, and each row of data contains the value of this transaction ID, which is used to implement transaction-level multi-version concurrency control (MVCC).
  2. 7-byte rollback pointer roll pointer
    To support transaction rollback operations, each row of data contains a rollback pointer that points to the corresponding record in the rollback log. When a transaction needs to roll back, the rollback pointer can quickly locate the corresponding rollback record and restore the data.
  3. Non-null column bitmap
    The non-null column bitmap is a mechanism used in the InnoDB storage engine to improve storage efficiency. Since many columns in a table often have NULL values, the information of these columns can be represented using a bitmap, thereby compressing storage space as much as possible when storing data.
  4. Record header
    Each row of data contains a record header, which includes some metadata, such as whether the data has been deleted or locked.
  5. Version number
    In InnoDB's MVCC implementation, each row of data also contains a version number to indicate the current version of the row. When a transaction updates a row of data, it actually adds a new record based on the original row of data and distinguishes between the two records using the version number.

In summary, in addition to the values of the fields defined in the table, InnoDB rows also contain a lot of other metadata, which plays an important role in the InnoDB storage engine's implementation of transaction support, MVCC, storage efficiency, and data recovery.

Architecture#

Below is the InnoDB architecture diagram, with the left side showing the memory structure and the right side showing the disk structure.

image.png|500

Memory Structure - Buffer Pool#

Operate on the buffer pool data and refresh it to the disk at a certain frequency to reduce disk IO.

image.png|500

The buffer pool is managed at the page level using a linked list and is divided into three types:

  1. free page: free page, not used
  2. clean page: used page, but data has not been modified
  3. dirty page: dirty page, used page with modified data, at this time the data is inconsistent with the data on disk and needs to be flushed to disk.

Memory Structure - Change Buffer#

Cache data changes in the change buffer. When data is read in the future, the data is merged and restored to the buffer pool, and then the merged data will be flushed to disk.

image.png|500

The significance of the change buffer:
Secondary indexes are usually non-unique and inserted randomly. The range of delete change operations is relatively large, causing a lot of disk IO. Merging data in the buffer pool can reduce disk IO.

Memory Structure - Adaptive Hash Index#

Optimize data queries on the buffer pool.

image.png|500

Memory Structure - Log Buffer#

The log buffer saves redo logs, undo logs, and similar log data, with a default size of 16MB, periodically flushed to disk to reduce frequent disk IO.

image.png|500

Disk Structure - System Tablespace#

image.png|500

Disk Structure - File-Per-Table Tablespaces:#

image.png|500

Disk Structure - General Tablespaces#

image.png|500

Disk Structure - Undo Tablespaces Temporary Tablespaces#

image.png|500

Disk Structure - Doublewrite Buffer Files Redo Log#

image.png|500

Background Threads#

The role of background threads: flushing memory to disk.

image.png|500

Transaction Principles#

A transaction is a collection of operations that form an indivisible unit of work. A transaction submits or rolls back all operations as a whole to the system, meaning these operations either all succeed or all fail.

ACID Properties

  • Atomicity: A transaction is the smallest indivisible unit of operation; it either succeeds completely or fails completely.
  • Consistency: When a transaction is completed, all data must remain in a consistent state.
  • Isolation: The isolation mechanism provided by the database system ensures that transactions run in an independent environment unaffected by external concurrent operations.
  • Durability: Once a transaction is committed or rolled back, its changes to the data in the database are permanent.

Atomicity - undo log
Durability - redo log
Consistency - undo log + redo log
Isolation - locks + MVCC

redo log - Durability#

When a dirty page is flushed to disk and an error occurs, data recovery is performed based on the redo log.

image.png|500

WAL (write-ahead logging) writes logs first, with logs being appended sequentially, resulting in sequential disk IO rather than random disk IO when flushing dirty pages.

undo log - Atomicity#

The rollback log (undo log) is used to record information before data is modified, providing rollback and MVCC.

Undo logs are mainly used for rollback, so they are logical logs that record opposite operation statements, allowing rollback to read the corresponding content and roll back.

image.png|500

MVCC#

RU: Directly read the most recent data, no MVCC, no locks, nothing.
RC: Each read generates a snapshot read, allowing access to the latest committed data.
RR: The first select statement after the transaction starts generates a snapshot read, which continues to use this read view, ensuring repeatable reads.
SE: Each read is a current read, which will be blocked and also locked.

Current read:
Reads the latest version of the record, ensuring that other concurrent transactions cannot modify the current record during the read, locking the read record, such as:
select..lock in share mode shared lock, select..for update, update, insert, delete exclusive locks are all types of current reads.

Snapshot read:
A simple select (without locking) is a snapshot read, reading the visible version of the record data, which may be historical data, non-blocking read.

  • Read Committed: Each select generates a snapshot read.
  • Repeatable Read: The first select statement after the transaction starts is the snapshot read point.
  • Serializable: Snapshot reads degrade to current reads.

MVCC:
Full name Multi-Version Concurrency Control, refers to maintaining multiple versions of data, allowing read and write operations to have no conflicts. Snapshot reads provide a non-blocking read feature for MySQL's implementation of MVCC.
The specific implementation of MVCC relies on three implicit fields in the database record, undo log, and readView.

Implementation Principle - Three Implicit Fields#

DB_TRX_ID, DB_ROLL_PTR, DB_ROW_ID

image.png|500

Implementation Principle - Undo Log#

The rollback log (undo log) is generated during insert, update, and delete operations, facilitating data rollback.
The undo log generated during insert is only needed during rollback and can be immediately deleted after the transaction is committed.
The undo log generated during update and delete is needed not only during rollback but also during snapshot reads and will not be immediately deleted.

image.png|500

Different transactions or the same transaction modifying the same record will lead to the generation of a record version linked list for that record's undo log, where the head of the list is the latest old record and the tail is the earliest old record.

Implementation Principle - readView#

Each transaction generates a different readView when reading data.

image.png|500

ReadView version chain access rules:

  1. First, check if the current transaction trx_id equals creator_trx_id; if so, access it; if not, go to 2.
  2. Check if the current transaction trx_id is between min_trx_id and max_trx_id (maximum trx_id + 1) and is no longer active (committed); if so, access it; if not, go to 3.
  3. Find the maximum version less than min_trx_id (the earliest committed version).

image.png|500

Read Committed

Transaction 5's first snapshot read, version with trx_id = 2 meets the requirements of condition 2, trx_id < min_trx_id = 3, allowing read of transaction 2's committed state.

Transaction 5's second snapshot read, version with trx_id = 3 meets the requirements of condition 2, trx_id < min_trx_id = 4, allowing read of transaction 3's committed state.

image.png|500

Repeatable Read

The RR transaction isolation level generates a readView only during the first snapshot read execution, and subsequently reuses that readView.

image.png|500

This article is synchronized and updated to xLog by Mix Space. The original link is https://blog.0xling.cyou/posts/mysql/mysql-2

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.