第21期：索引设计（函数索引）|key|操作符|json|调用

分享至

本篇主要介绍 MySQL 的函数索引（也叫表达式索引）。

通常来讲，索引都是基于字段本身或者字段前缀（第 20 篇），而函数索引是基于字段本身加上函数、操作符、表达式等计算而来。如果将表达式或者操作符也看做函数的话，简单来说，这样的索引就可以统称函数索引。

MySQL 的函数索引内部是基于虚拟列（generated columns）实现，不同于直接定义虚拟列，函数索引自动创建的虚拟列本身实时计算结果，并不存储数据，只把函数索引本身存在磁盘上。

MySQL 8.0.13 之前不支持函数索引，所以老版本包括现在主流的 MySQL 5.7 也不支持函数索引，需要手工模拟创建或者改 SQL。

本章基于以下几点来讲函数索引：

1.函数索引的使用场景

函数索引最最经典的使用场景莫过于就是对日期的处理，特别是表中只定义了一个字段，后期对这个字段的查询都是基于部分结果。比如 “2100-02-02 08:09:09.123972” 包含了日期 “2100-02-02”，时间 “08:09:09”，小数位时间 “123972”，有可能会对这个值拆解后部分查询。

举个简单例子，表 t1 有两个字段，一个主键，另外一个时间字段，总记录数不到 40W。

show create table t1\G *************************** 1. row *************************** Table: t1 Create Table: CREATE TABLE `t1` ( `id` bigint unsigned NOT NULL AUTO_INCREMENT, `log_time` datetime(6) DEFAULT NULL, PRIMARY KEY (`id`), KEY `idx_log_time` (`log_time`) ) ENGINE=InnoDB AUTO_INCREMENT=524268 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci 1 row in set (0.00 sec) select count(*) from t1; +----------+ | count(*) | +----------+ | 393216 | +----------+ 1 row in set (0.07 sec)

执行下面这条 SQL 1，把日期单独拿出来，执行了 0.09 秒。

# SQL 1 select * from t1 where date(log_time) = '2100-02-02'; +--------+----------------------------+ | id | log_time | +--------+----------------------------+ | 524267 | 2100-02-02 08:09:09.123972 | +--------+----------------------------+ 1 row in set (0.09 sec)

看下它的执行计划，虽然走了索引，但是扫描行数为总记录数，相当于全表扫，这时候比全表扫还不理想，全表扫直接走聚簇索引还快点。

explain select * from t1 where date(log_time) = '2100-02-02'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t1 partitions: NULL type: index possible_keys: NULL key: idx_log_time key_len: 9 ref: NULL rows: 392413 filtered: 100.00 Extra: Using where; Using index 1 row in set, 1 warning (0.00 sec)

这时最好的方法就是为列 log_time 加一新索引，基于函数 date 的函数索引。

alter table t1 add key idx_func_index_1((date(log_time))); Query OK, 0 rows affected (2.76 sec) Records: 0 Duplicates: 0 Warnings: 0

再次执行上面的 SQL 1，瞬间执行完毕。

select * from t1 where date(log_time) = '2100-02-02'; +--------+----------------------------+ | id | log_time | +--------+----------------------------+ | 524267 | 2100-02-02 08:09:09.123972 | +--------+----------------------------+ 1 row in set (0.00 sec)

接下来查看执行计划，结果显示走函数索引 idx_func_index_1 扫描记录数只有一行，执行计划达到最优。

explain select * from t1 where date(log_time) = '2100-02-02'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t1 partitions: NULL type: ref possible_keys: idx_func_index_1 key: idx_func_index_1 key_len: 4 ref: const rows: 1 filtered: 100.00 Extra: NULL 1 row in set, 1 warning (0.00 sec)

如果想查看 MySQL 函数索引内部创建的列，直接 show create table 看是没有结果的，比如下面只看到一个新的索引。

show create table t1\G ... KEY `idx_func_index_1` ((cast(`log_time` as date))) ) ENGINE=InnoDB AUTO_INCREMENT=524268 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci 1 row in set (0.00 sec)

通过 MySQL 8.0 的新语句 show extended columns 查看隐藏的列，下面结果发现确实是新加了一个虚拟列。

2.函数索引在处理 JSON 类型的注意事项

比如需要遍历 JSON 类型的子串作为索引，直接用遍历操作符 ->> 报错。

create table t2 (id int primary key, r1 json); Query OK, 0 rows affected (0.09 sec) alter table t2 add key idx_func_index_2((r1->>'$.x')); ERROR 3757 (HY000): Cannot create a functional index on an expression that returns a BLOB or TEXT. Please consider using CAST.

操作符 ->> 表示从 JSON 串中遍历指定路径的 value，在 MySQL 内部转换为 json_unquote(jso_extract(...))，而函数 json_unquote 返回结果具有以下特性：

• 数据类型为 longtext，在 MySQL 里 longtext 类型只支持前缀索引，必须用函数 cast 来转换类型。
• json_unquote 调用结果的排序规则为 utf8mb4_bin，cast 调用结果的排序规则和当前 session 的排序规则一样，有可能不是 utf8mb4_bin，所以函数索引中要显式定义排序规则。

所以针对 JSON 字段来建立新的函数索引：

alter table t2 add key idx_func_index_2((cast(r1->>'$.x' as char(1)) collate utf8mb4_bin)); Query OK, 0 rows affected (0.07 sec) Records: 0 Duplicates: 0 Warnings: 0

看下表结构，操作符 ->> 被转换为 json_unquote(json_extract(...))，并且排序规则为 utf8mb4_bin。

show create table t2\G *************************** 1. row *************************** Table: t2 ... KEY `idx_func_index_2` (((cast(json_unquote(json_extract(`r1`,_utf8mb4'$.x')) as char(1) charset utf8mb4) collate utf8mb4_bin))) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci 1 row in set (0.00 sec)

接下来插入几条记录，看看这个函数索引的使用。

select * from t2; +----+---------------------+ | id | r1 | +----+---------------------+ | 1 | {"x": "1", "y": 10} | | 2 | {"x": "2", "y": 20} | | 3 | {"x": "a", "y": 20} | | 4 | {"x": "A", "y": 20} | +----+---------------------+ 4 rows in set (0.00 sec)

执行下 SQL 2，并且看下执行计划，直接走了刚才创建的函数索引。

# SQL 2 select * from t2 where r1->>'$.x'='a'; +----+---------------------+ | id | r1 | +----+---------------------+ | 3 | {"x": "a", "y": 20} | +----+---------------------+ 1 row in set (0.00 sec) explain select * from t2 where r1->>'$.x'='a'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t2 partitions: NULL type: ref possible_keys: idx_func_index_2 key: idx_func_index_2 key_len: 7 ref: const rows: 1 filtered: 100.00 Extra: NULL 1 row in set, 1 warning (0.00 sec)

这里其实应该有个疑问，对函数索引的调用，必须要按照之前定义好的函数来执行，否则不会用到索引，那 SQL 2 怎么可以直接到用索引？ MySQL 在这块儿其实内部已经转换为正确的语句。查看下刚才 EXPLAIN 的 WARNINGS 信息。可以看到 SQL 2 被 MySQL 转换为遵守函数索引规则的正确语句。

show warnings\G *************************** 1. row *************************** Level: Note Code: 1003 Message: /* select#1 */ select `ytt`.`t2`.`id` AS `id`,`ytt`.`t2`.`r1` AS `r1` from `ytt`.`t2` where ((cast(json_unquote(json_extract(`ytt`.`t2`.`r1`,_utf8mb4'$.x')) as char(1) charset utf8mb4) collate utf8mb4_bin) = 'a') 1 row in set (0.00 sec)

3.函数索引替代前缀索引？

之前讲过前缀索引，可能会有这样的疑问。前缀索引能不能被函数索引替代？当然是不行的！函数索引要求查询条件严格按照函数索引的定义匹配，虽然有的场景下 MySQL 可以内部转换，但是 MySQL 无法为每个函数都替换为最优化的写法。比如函数 substring,left,right 等。

下面例子用来模拟下是否可以用函数索引替代前缀索引。示例表 t3，一个前缀索引和两个函数索引实现的目的一样，但是实际查询的时候 SQL 语句并不一样。

show create table t3\G *************************** 1. row *************************** Table: t3 Create Table: CREATE TABLE `t3` ( `id` bigint unsigned NOT NULL AUTO_INCREMENT, `r1` char(36) DEFAULT NULL, PRIMARY KEY (`id`), UNIQUE KEY `id` (`id`), KEY `idx_r1_prefix` (`r1`(8)), KEY `idx_func_index_3` ((left(`r1`,8))), KEY `idx_func_index_4` ((substr(`r1`,1,8))) ) ENGINE=InnoDB AUTO_INCREMENT=249 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci 1 row in set (0.00 sec) 以下 SQL 3 、SQL 4、SQL 5 写法不一样，查询结果一样，走的索引不一样。 # SQL 3 select * from t3 where r1 like 'de45c7d9%'; # SQL 4 select * from t3 where left(r1,8) ='de45c7d9'; # SQL 5 select * from t3 where substring(r1,1,8) ='de45c7d9'; select * from t3 where r1 like 'de45c7d9%'; +-----+--------------------------------------+ | id | r1 | +-----+--------------------------------------+ | 178 | de45c7d9-935c-11ea-8421-08002753f58d | +-----+--------------------------------------+ 1 row in set (0.00 sec) select * from t3 where left(r1,8) ='de45c7d9'; +-----+--------------------------------------+ | id | r1 | +-----+--------------------------------------+ | 178 | de45c7d9-935c-11ea-8421-08002753f58d | +-----+--------------------------------------+ 1 row in set (0.00 sec) select * from t3 where substring(r1,1,8) ='de45c7d9'; +-----+--------------------------------------+ | id | r1 | +-----+--------------------------------------+ | 178 | de45c7d9-935c-11ea-8421-08002753f58d | +-----+--------------------------------------+ 1 row in set (0.00 sec)

各自的查询计划，每条 SQL 走的不同的索引。

explain select * from t3 where r1 like 'de45c7d9%'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t3 partitions: NULL type: range possible_keys: idx_r1_prefix key: idx_r1_prefix key_len: 33 ref: NULL rows: 1 filtered: 100.00 Extra: Using where 1 row in set, 1 warning (0.00 sec) explain select * from t3 where left(r1,8) ='de45c7d9'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t3 partitions: NULL type: ref possible_keys: idx_func_index_3 key: idx_func_index_3 key_len: 35 ref: const rows: 1 filtered: 100.00 Extra: Using where 1 row in set, 1 warning (0.00 sec) explain select * from t3 where substring(r1,1,8) ='de45c7d9'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t3 partitions: NULL type: ref possible_keys: idx_func_index_4 key: idx_func_index_4 key_len: 35 ref: const rows: 1 filtered: 100.00 Extra: Using where 1 row in set, 1 warning (0.00 sec)

此时删除掉函数索引 idx_func_index_3, SQL 4 就无法走正确的索引。

alter table t3 drop key idx_func_index_3; Query OK, 0 rows affected (0.05 sec) Records: 0 Duplicates: 0 Warnings: 0 explain select * from t3 where left(r1,8) ='de45c7d9'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t3 partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 128 filtered: 100.00 Extra: Using where 1 row in set, 1 warning (0.00 sec)

查看 warnings，发现 MySQL 优化器转换后的 SQL，LEFT 函数还是保持原样，但是表里没有基于 LEFT 函数的索引，只能全表扫。

show warnings\G *************************** 1. row *************************** Level: Note Code: 1003 Message: /* select#1 */ select `ytt`.`t3`.`id` AS `id`,`ytt`.`t3`.`r1` AS `r1` from `ytt`.`t3` where (left(`ytt`.`t3`.`r1`,8) = 'de45c7d9') 1 row in set (0.00 sec)

4.老版本如何实现函数索引

函数索引是 MySQL 8.0.13 才有的。那在老的版本如何实现呢？

MySQL 5.7 自持虚拟列，只需要在虚拟列上创建一个普通索引就行。

MySQL 5.6 以及 MySQL 5.5 等，则需要自己定义一个冗余列，然后定期更新这列内容。当然最核心的是如何规划好定期更新内容这块。这块如果讨论起来，内容非常多，可以参考我之前写的关于表样例数据更新收集这块内容，MySQL 内部的做法。

关于 MySQL 的技术内容，你们还有什么想知道的吗？赶紧留言告诉小编吧！

特别声明：以上内容(如有图片或视频亦包括在内)为自媒体平台“网易号”用户上传并发布，本平台仅提供信息存储服务。

Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.