MySQL死锁相关的知识点总结

1. 死锁模拟

按时间顺序，操作顺序如下：

session 1

lcx[root@localhost:(test) 14:51:36]>begin;
Query OK, 0 rows affected (0.00 sec)

lcx[root@localhost:(test) 14:52:50]>select * from test_count where id = 2 for update;
+----+-----+-----------+------+
| id | qid | ip        | ip2  |
+----+-----+-----------+------+
|  2 | 102 | 127.0.1.1 | NULL |
+----+-----+-----------+------+
1 row in set (0.01 sec)

session 2

lcx[root@localhost:(test) 14:52:04]>begin;
Query OK, 0 rows affected (0.00 sec)

lcx[root@localhost:(test) 14:53:16]>select * from test_count where id = 3 for update;
+----+-----+-----------+------+
| id | qid | ip        | ip2  |
+----+-----+-----------+------+
|  3 | 103 | 127.0.1.3 | NULL |
+----+-----+-----------+------+
1 row in set (0.00 sec)

session 1

lcx[root@localhost:(test) 14:53:08]>update test_count set ip = '127.0.2.3' where id = 3;
Query OK, 1 row affected (13.58 sec)
Rows matched: 1  Changed: 1  Warnings: 0

session 2

lcx[root@localhost:(test) 14:53:31]>update test_count set ip = '127.0.2.3' where id = 2;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

general log

2023-07-24T06:52:50.328946Z        31 Query     begin
2023-07-24T06:53:08.872687Z        31 Query     select * from test_count where id = 2 for update
2023-07-24T06:53:16.104128Z        32 Query     begin
2023-07-24T06:53:31.334218Z        32 Query     select * from test_count where id = 3 for update
2023-07-24T06:53:58.749765Z        31 Query     update test_count set ip = '127.0.2.3' where id = 3
2023-07-24T06:54:12.311142Z        32 Query     update test_count set ip = '127.0.2.3' where id = 2

分析：

当开启死锁检测的时候，在出现死锁的时候一个事务出现回滚，会有报错:

ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

2. SQL死锁检测机制

MySQL有两个关于deadlock的变量：

lcx[root@localhost:(test) 15:26:21]>show variables like '%dead%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| innodb_deadlock_detect     | ON    |
| innodb_print_all_deadlocks | ON    |
+----------------------------+-------+
2 rows in set (0.00 sec)

2.1 innodb_deadlock_detect

innodb_deadlock_detect 是检测死锁的一种方法,从mysql 5.7.13引入的。

死锁检测的原理是构建一个以事务为顶点、锁为边的有向图，判断有向图是否存在环，存在即有死锁。

回滚机制

检测到死锁之后，选择插入更新或者删除的行数最少的事务回滚，基于 INFORMATION_SCHEMA.INNODB_TRX 表中的 trx_weight 字段来判断。

lcx[root@localhost:(test) 15:39:12]>select * from INFORMATION_SCHEMA.INNODB_TRX\G
*************************** 1. row ***************************
                    trx_id: 143162
                 trx_state: LOCK WAIT
               trx_started: 2023-07-24 15:39:00
     trx_requested_lock_id: 143162:29:3:4
          trx_wait_started: 2023-07-24 15:39:00
                trx_weight: 2
       trx_mysql_thread_id: 41
                 trx_query: select * from test_count where id = 3 for update
       trx_operation_state: starting index read
         trx_tables_in_use: 1
         trx_tables_locked: 1
          trx_lock_structs: 2
     trx_lock_memory_bytes: 1136
           trx_rows_locked: 1
         trx_rows_modified: 0
   trx_concurrency_tickets: 0
       trx_isolation_level: READ COMMITTED
         trx_unique_checks: 1
    trx_foreign_key_checks: 1
trx_last_foreign_key_error: NULL
 trx_adaptive_hash_latched: 0
 trx_adaptive_hash_timeout: 0
          trx_is_read_only: 0
trx_autocommit_non_locking: 0

如果关闭innodb_deadlock_detect，也即关闭了死锁自动监测机制时，当两个或多个session间存在死锁的情况下，MySQL怎么去处理？

这里会涉及到另外一个参数：锁超时，也即innodb_lock_wait_timeout，该参数指定了“锁申请时候的最长等待时间”
官方的解释是：The length of time in seconds an InnoDB transaction waits for a row lock before giving up.

innodb_lock_wait_timeout默认值是50秒，也就是意味着session请求时，申请不到锁的情况下最多等待50秒钟，然后呢，就等价于死锁，自动回滚当前事物了？其实不是的，事情没有想象中的简单。

这里就涉及到另外一个参数：innodb_rollback_on_timeout，默认值是off.

2.2 innodb_rollback_on_timeout

该参数的决定了当前请求锁超时之后，回滚的是整个事物，还是仅当前语句，官方的解释是：InnoDB rolls back only the last statement on a transaction timeout by default。

默认值是off，也就是回滚当前语句（放弃当前语句的锁申请），有人强烈建议打开这个选项（on），也就是一旦锁申请超时，就回滚整个事物。

需要注意的是，默认情况下只回滚当前语句，而不是整个事物，当前的事物还在继续，连接也还在，这里与死锁自动监测机制打开之后会主动牺牲一个事物不同，锁超时后并不会主动牺牲其中任何一个事物。这意味着会出现一种非常严重的情况，事务一直在重试，不释放锁。

2.3 innodb_print_all_deadlocks

开启后，会在错误日志中打印死锁的相关信息，比如上面模拟的死锁信息如下：

2023-07-24T08:16:21.143821Z 44 [Note] InnoDB: Transactions deadlock detected, dumping detailed information.
2023-07-24T08:16:21.143856Z 44 [Note] InnoDB:
*** (1) TRANSACTION:

TRANSACTION 143164, ACTIVE 40 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 43, OS thread handle 140043651081984, query id 388 localhost root updating
update test_count set ip = '127.0.1.2' where id = 3
2023-07-24T08:16:21.143902Z 44 [Note] InnoDB: *** (1) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 29 page no 3 n bits 80 index PRIMARY of table `eo_oslog`.`test_count` trx id 143164 lock_mode X locks rec but not gap waiting
Record lock, heap no 4 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
 0: len 8; hex 0000000000000003; asc         ;;
 1: len 6; hex 00000000070e; asc       ;;
 2: len 7; hex ac000001200110; asc        ;;
 3: len 8; hex 0000000000000067; asc        g;;
 4: len 9; hex 3132372e302e312e33; asc 127.0.1.3;;
 5: SQL NULL;

2023-07-24T08:16:21.145475Z 44 [Note] InnoDB: *** (2) TRANSACTION:

TRANSACTION 143165, ACTIVE 37 sec starting index read
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 44, OS thread handle 140043650541312, query id 389 localhost root updating
update test_count set ip = '127.0.1.2' where id = 2
2023-07-24T08:16:21.145523Z 44 [Note] InnoDB: *** (2) HOLDS THE LOCK(S):

RECORD LOCKS space id 29 page no 3 n bits 80 index PRIMARY of table `eo_oslog`.`test_count` trx id 143165 lock_mode X locks rec but not gap
Record lock, heap no 4 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
 0: len 8; hex 0000000000000003; asc         ;;
 1: len 6; hex 00000000070e; asc       ;;
 2: len 7; hex ac000001200110; asc        ;;
 3: len 8; hex 0000000000000067; asc        g;;
 4: len 9; hex 3132372e302e312e33; asc 127.0.1.3;;
 5: SQL NULL;

2023-07-24T08:16:21.145850Z 44 [Note] InnoDB: *** (2) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 29 page no 3 n bits 80 index PRIMARY of table `eo_oslog`.`test_count` trx id 143165 lock_mode X locks rec but not gap waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
 0: len 8; hex 0000000000000002; asc         ;;
 1: len 6; hex 000000000709; asc       ;;
 2: len 7; hex a90000011d0110; asc        ;;
 3: len 8; hex 0000000000000066; asc        f;;
 4: len 9; hex 3132372e302e312e31; asc 127.0.1.1;;
 5: SQL NULL;

2023-07-24T08:16:21.146152Z 44 [Note] InnoDB: *** WE ROLL BACK TRANSACTION (2)

3. 死锁的处理方式

MySQL死锁是指多个事务相互等待对方所持有的资源，导致无法继续执行的情况。当发生死锁时，MySQL会自动选择一个事务进行回滚，以解除死锁并恢复正常运行。以下是一些处理MySQL死锁的方法：

设置合适的事务隔离级别：事务隔离级别可以通过设置来控制并发访问数据库时的锁定行为。合理选择事务隔离级别能够减少死锁的概率。
减少事务持有锁的时间：尽量缩短事务的执行时间，避免在一个事务中同时持有多个资源的锁，减少死锁的可能性。
合理设计数据库索引：合理的索引设计可以提高查询性能，并减少锁的竞争。考虑使用覆盖索引、避免过长的事务等方法。
监控和分析死锁：通过MySQL的日志或工具，可以监控和分析死锁情况，找出引发死锁的原因，并采取相应的措施进行优化。
优化数据库设计和应用程序逻辑：良好的数据库设计和应用程序逻辑能够减少并发访问时的资源竞争，降低死锁的风险。

请注意，以上只是一些常见的处理MySQL死锁的方法，具体应根据实际情况进行分析和优化。

等待，直到超时（innodb_lock_wait_timeout=50s），自动回滚事务；
发起死锁检测，主动回滚一条事务，让其他事务继续执行（innodb_deadlock_detect=on）。

一些超时参数设置

show variables like '%timeout%';
+-----------------------------------+----------+
| Variable_name                     | Value    |
+-----------------------------------+----------+
| clone_ddl_timeout                 | 300      |
| connect_timeout                   | 10       |
| delayed_insert_timeout            | 300      |
| have_statement_timeout            | YES      |
| innodb_flush_log_at_timeout       | 1        |
| innodb_lock_wait_timeout          | 50       |
| innodb_rollback_on_timeout        | ON       |
| interactive_timeout               | 28800    |
| lock_wait_timeout                 | 180      |
| mysqlx_connect_timeout            | 30       |
| mysqlx_idle_worker_thread_timeout | 60       |
| mysqlx_interactive_timeout        | 28800    |
| mysqlx_port_open_timeout          | 0        |
| mysqlx_read_timeout               | 30       |
| mysqlx_wait_timeout               | 28800    |
| mysqlx_write_timeout              | 60       |
| net_read_timeout                  | 3600     |
| net_write_timeout                 | 3600     |
| rpl_stop_slave_timeout            | 31536000 |
| slave_net_timeout                 | 60       |
| wait_timeout                      | 28800    |
+-----------------------------------+----------+
21 rows in set (0.00 sec)

遇到的错误：

中途网络断开，后来重新连接跳板机，在执行select * from test_count where id = 2 for update时发现卡住了，非常奇怪，过了一会儿，报错锁等待超时：

lcx[root@localhost:(test) 15:11:15]>select * from test_count where id = 2 for update;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

再次执行还是卡住，查询锁等待：

lcx[root@localhost:(test) 15:13:38]> select * from sys.innodb_lock_waits \G
*************************** 1. row ***************************
                wait_started: 2023-07-24 15:13:35
                    wait_age: 00:00:13
               wait_age_secs: 13
                locked_table: `test`.`test_count`
                locked_index: PRIMARY
                 locked_type: RECORD
              waiting_trx_id: 143152
         waiting_trx_started: 2023-07-24 15:10:24
             waiting_trx_age: 00:03:24
     waiting_trx_rows_locked: 3
   waiting_trx_rows_modified: 0
                 waiting_pid: 37
               waiting_query: select * from test_count where id = 2 for update
             waiting_lock_id: 143152:29:3:3
           waiting_lock_mode: X
             blocking_trx_id: 143148
                blocking_pid: 31
              blocking_query: NULL
            blocking_lock_id: 143148:29:3:3
          blocking_lock_mode: X
        blocking_trx_started: 2023-07-24 14:53:08
            blocking_trx_age: 00:20:40
    blocking_trx_rows_locked: 2
  blocking_trx_rows_modified: 1
     sql_kill_blocking_query: KILL QUERY 31
sql_kill_blocking_connection: KILL 31
1 row in set, 3 warnings (0.00 sec)

手动执行kill命令：

lcx[root@localhost:(test) 15:13:48]>kill 31;
Query OK, 0 rows affected (0.00 sec)

SQL恢复：

lcx[root@localhost:(test) 15:13:17]>select * from test_count where id = 2 for update;
+----+-----+-----------+------+
| id | qid | ip        | ip2  |
+----+-----+-----------+------+
|  2 | 102 | 127.0.1.1 | NULL |
+----+-----+-----------+------+
1 row in set (36.73 sec)

目录CONTENT