1. 死锁模拟
按时间顺序,操作顺序如下:
session 1
lcx[root@localhost:(test) 14:51:36]>begin;
Query OK, 0 rows affected (0.00 sec)
lcx[root@localhost:(test) 14:52:50]>select * from test_count where id = 2 for update;
+----+-----+-----------+------+
| id | qid | ip | ip2 |
+----+-----+-----------+------+
| 2 | 102 | 127.0.1.1 | NULL |
+----+-----+-----------+------+
1 row in set (0.01 sec)
session 2
lcx[root@localhost:(test) 14:52:04]>begin;
Query OK, 0 rows affected (0.00 sec)
lcx[root@localhost:(test) 14:53:16]>select * from test_count where id = 3 for update;
+----+-----+-----------+------+
| id | qid | ip | ip2 |
+----+-----+-----------+------+
| 3 | 103 | 127.0.1.3 | NULL |
+----+-----+-----------+------+
1 row in set (0.00 sec)
session 1
lcx[root@localhost:(test) 14:53:08]>update test_count set ip = '127.0.2.3' where id = 3;
Query OK, 1 row affected (13.58 sec)
Rows matched: 1 Changed: 1 Warnings: 0
session 2
lcx[root@localhost:(test) 14:53:31]>update test_count set ip = '127.0.2.3' where id = 2;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
general log
2023-07-24T06:52:50.328946Z 31 Query begin
2023-07-24T06:53:08.872687Z 31 Query select * from test_count where id = 2 for update
2023-07-24T06:53:16.104128Z 32 Query begin
2023-07-24T06:53:31.334218Z 32 Query select * from test_count where id = 3 for update
2023-07-24T06:53:58.749765Z 31 Query update test_count set ip = '127.0.2.3' where id = 3
2023-07-24T06:54:12.311142Z 32 Query update test_count set ip = '127.0.2.3' where id = 2
分析:
当开启死锁检测的时候,在出现死锁的时候一个事务出现回滚,会有报错:
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
2. SQL死锁检测机制
MySQL有两个关于deadlock的变量:
lcx[root@localhost:(test) 15:26:21]>show variables like '%dead%';
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| innodb_deadlock_detect | ON |
| innodb_print_all_deadlocks | ON |
+----------------------------+-------+
2 rows in set (0.00 sec)
2.1 innodb_deadlock_detect
innodb_deadlock_detect 是检测死锁的一种方法,从mysql 5.7.13引入的。
死锁检测的原理是构建一个以事务为顶点、锁为边的有向图,判断有向图是否存在环,存在即有死锁。
回滚机制
检测到死锁之后,选择插入更新或者删除的行数最少的事务回滚,基于 INFORMATION_SCHEMA.INNODB_TRX
表中的 trx_weight 字段来判断。
lcx[root@localhost:(test) 15:39:12]>select * from INFORMATION_SCHEMA.INNODB_TRX\G
*************************** 1. row ***************************
trx_id: 143162
trx_state: LOCK WAIT
trx_started: 2023-07-24 15:39:00
trx_requested_lock_id: 143162:29:3:4
trx_wait_started: 2023-07-24 15:39:00
trx_weight: 2
trx_mysql_thread_id: 41
trx_query: select * from test_count where id = 3 for update
trx_operation_state: starting index read
trx_tables_in_use: 1
trx_tables_locked: 1
trx_lock_structs: 2
trx_lock_memory_bytes: 1136
trx_rows_locked: 1
trx_rows_modified: 0
trx_concurrency_tickets: 0
trx_isolation_level: READ COMMITTED
trx_unique_checks: 1
trx_foreign_key_checks: 1
trx_last_foreign_key_error: NULL
trx_adaptive_hash_latched: 0
trx_adaptive_hash_timeout: 0
trx_is_read_only: 0
trx_autocommit_non_locking: 0
如果关闭innodb_deadlock_detect,也即关闭了死锁自动监测机制时,当两个或多个session间存在死锁的情况下,MySQL怎么去处理?
这里会涉及到另外一个参数:锁超时,也即innodb_lock_wait_timeout,该参数指定了“锁申请时候的最长等待时间”
官方的解释是:The length of time in seconds an InnoDB transaction waits for a row lock before giving up.
innodb_lock_wait_timeout默认值是50秒,也就是意味着session请求时,申请不到锁的情况下最多等待50秒钟,然后呢,就等价于死锁,自动回滚当前事物了?其实不是的,事情没有想象中的简单。
这里就涉及到另外一个参数:innodb_rollback_on_timeout,默认值是off.
2.2 innodb_rollback_on_timeout
该参数的决定了当前请求锁超时之后,回滚的是整个事物,还是仅当前语句,官方的解释是:InnoDB rolls back only the last statement on a transaction timeout by default。
默认值是off,也就是回滚当前语句(放弃当前语句的锁申请),有人强烈建议打开这个选项(on),也就是一旦锁申请超时,就回滚整个事物。
需要注意的是,默认情况下只回滚当前语句,而不是整个事物,当前的事物还在继续,连接也还在,这里与死锁自动监测机制打开之后会主动牺牲一个事物不同,锁超时后并不会主动牺牲其中任何一个事物。这意味着会出现一种非常严重的情况,事务一直在重试,不释放锁。
2.3 innodb_print_all_deadlocks
开启后,会在错误日志中打印死锁的相关信息,比如上面模拟的死锁信息如下:
2023-07-24T08:16:21.143821Z 44 [Note] InnoDB: Transactions deadlock detected, dumping detailed information.
2023-07-24T08:16:21.143856Z 44 [Note] InnoDB:
*** (1) TRANSACTION:
TRANSACTION 143164, ACTIVE 40 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 43, OS thread handle 140043651081984, query id 388 localhost root updating
update test_count set ip = '127.0.1.2' where id = 3
2023-07-24T08:16:21.143902Z 44 [Note] InnoDB: *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 29 page no 3 n bits 80 index PRIMARY of table `eo_oslog`.`test_count` trx id 143164 lock_mode X locks rec but not gap waiting
Record lock, heap no 4 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 8; hex 0000000000000003; asc ;;
1: len 6; hex 00000000070e; asc ;;
2: len 7; hex ac000001200110; asc ;;
3: len 8; hex 0000000000000067; asc g;;
4: len 9; hex 3132372e302e312e33; asc 127.0.1.3;;
5: SQL NULL;
2023-07-24T08:16:21.145475Z 44 [Note] InnoDB: *** (2) TRANSACTION:
TRANSACTION 143165, ACTIVE 37 sec starting index read
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 44, OS thread handle 140043650541312, query id 389 localhost root updating
update test_count set ip = '127.0.1.2' where id = 2
2023-07-24T08:16:21.145523Z 44 [Note] InnoDB: *** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 29 page no 3 n bits 80 index PRIMARY of table `eo_oslog`.`test_count` trx id 143165 lock_mode X locks rec but not gap
Record lock, heap no 4 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 8; hex 0000000000000003; asc ;;
1: len 6; hex 00000000070e; asc ;;
2: len 7; hex ac000001200110; asc ;;
3: len 8; hex 0000000000000067; asc g;;
4: len 9; hex 3132372e302e312e33; asc 127.0.1.3;;
5: SQL NULL;
2023-07-24T08:16:21.145850Z 44 [Note] InnoDB: *** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 29 page no 3 n bits 80 index PRIMARY of table `eo_oslog`.`test_count` trx id 143165 lock_mode X locks rec but not gap waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 8; hex 0000000000000002; asc ;;
1: len 6; hex 000000000709; asc ;;
2: len 7; hex a90000011d0110; asc ;;
3: len 8; hex 0000000000000066; asc f;;
4: len 9; hex 3132372e302e312e31; asc 127.0.1.1;;
5: SQL NULL;
2023-07-24T08:16:21.146152Z 44 [Note] InnoDB: *** WE ROLL BACK TRANSACTION (2)
3. 死锁的处理方式
MySQL死锁是指多个事务相互等待对方所持有的资源,导致无法继续执行的情况。当发生死锁时,MySQL会自动选择一个事务进行回滚,以解除死锁并恢复正常运行。以下是一些处理MySQL死锁的方法:
-
设置合适的事务隔离级别:事务隔离级别可以通过设置来控制并发访问数据库时的锁定行为。合理选择事务隔离级别能够减少死锁的概率。
-
减少事务持有锁的时间:尽量缩短事务的执行时间,避免在一个事务中同时持有多个资源的锁,减少死锁的可能性。
-
合理设计数据库索引:合理的索引设计可以提高查询性能,并减少锁的竞争。考虑使用覆盖索引、避免过长的事务等方法。
-
监控和分析死锁:通过MySQL的日志或工具,可以监控和分析死锁情况,找出引发死锁的原因,并采取相应的措施进行优化。
-
优化数据库设计和应用程序逻辑:良好的数据库设计和应用程序逻辑能够减少并发访问时的资源竞争,降低死锁的风险。
请注意,以上只是一些常见的处理MySQL死锁的方法,具体应根据实际情况进行分析和优化。
- 等待,直到超时(
innodb_lock_wait_timeout=50s
),自动回滚事务; - 发起死锁检测,主动回滚一条事务,让其他事务继续执行(
innodb_deadlock_detect=on
)。
一些超时参数设置
show variables like '%timeout%';
+-----------------------------------+----------+
| Variable_name | Value |
+-----------------------------------+----------+
| clone_ddl_timeout | 300 |
| connect_timeout | 10 |
| delayed_insert_timeout | 300 |
| have_statement_timeout | YES |
| innodb_flush_log_at_timeout | 1 |
| innodb_lock_wait_timeout | 50 |
| innodb_rollback_on_timeout | ON |
| interactive_timeout | 28800 |
| lock_wait_timeout | 180 |
| mysqlx_connect_timeout | 30 |
| mysqlx_idle_worker_thread_timeout | 60 |
| mysqlx_interactive_timeout | 28800 |
| mysqlx_port_open_timeout | 0 |
| mysqlx_read_timeout | 30 |
| mysqlx_wait_timeout | 28800 |
| mysqlx_write_timeout | 60 |
| net_read_timeout | 3600 |
| net_write_timeout | 3600 |
| rpl_stop_slave_timeout | 31536000 |
| slave_net_timeout | 60 |
| wait_timeout | 28800 |
+-----------------------------------+----------+
21 rows in set (0.00 sec)
遇到的错误:
中途网络断开,后来重新连接跳板机,在执行select * from test_count where id = 2 for update
时发现卡住了,非常奇怪,过了一会儿,报错锁等待超时:
lcx[root@localhost:(test) 15:11:15]>select * from test_count where id = 2 for update;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
再次执行还是卡住,查询锁等待:
lcx[root@localhost:(test) 15:13:38]> select * from sys.innodb_lock_waits \G
*************************** 1. row ***************************
wait_started: 2023-07-24 15:13:35
wait_age: 00:00:13
wait_age_secs: 13
locked_table: `test`.`test_count`
locked_index: PRIMARY
locked_type: RECORD
waiting_trx_id: 143152
waiting_trx_started: 2023-07-24 15:10:24
waiting_trx_age: 00:03:24
waiting_trx_rows_locked: 3
waiting_trx_rows_modified: 0
waiting_pid: 37
waiting_query: select * from test_count where id = 2 for update
waiting_lock_id: 143152:29:3:3
waiting_lock_mode: X
blocking_trx_id: 143148
blocking_pid: 31
blocking_query: NULL
blocking_lock_id: 143148:29:3:3
blocking_lock_mode: X
blocking_trx_started: 2023-07-24 14:53:08
blocking_trx_age: 00:20:40
blocking_trx_rows_locked: 2
blocking_trx_rows_modified: 1
sql_kill_blocking_query: KILL QUERY 31
sql_kill_blocking_connection: KILL 31
1 row in set, 3 warnings (0.00 sec)
手动执行kill命令:
lcx[root@localhost:(test) 15:13:48]>kill 31;
Query OK, 0 rows affected (0.00 sec)
SQL恢复:
lcx[root@localhost:(test) 15:13:17]>select * from test_count where id = 2 for update;
+----+-----+-----------+------+
| id | qid | ip | ip2 |
+----+-----+-----------+------+
| 2 | 102 | 127.0.1.1 | NULL |
+----+-----+-----------+------+
1 row in set (36.73 sec)
评论区