下午同事要作者写个MySQL去除重复数据的SQL,想起来上次写过一篇MySQL去除重复数据的博客,使用导入导出加独一索引完成的,可是这种形式对业务影响异常的大,所以再一次写一个存款和储蓄进度来删重复数据,这一写就写了贰个早晨,这种BUG确实是很令人黯然和浪费时间的。

目录

MySQL表碎片整理

  • 1.
    企图碎片大小
  • 2. 照看碎片
    • 2.1 使用alter table table_name engine =
      innodb命令实行整治。
    • 2.2
      使用pt-online-schema-change工具也能进行在线整理表结构,采摘碎片等操作。
    • 2.3 使用optimize
      table命令,整理碎片。
  • 3. 收拾表碎片shell脚本

那边把流程简便的叙述一下,删重复数据的逻辑极粗略:

一、概述

1. 划算碎片大小

要照料碎片,首先要询问碎片的总结方式。

能够经过show table [from|in db_name] status like '%table_name%'一声令下查看:

mysql> show table from employees status like 't1'\G
*************************** 1. row ***************************
           Name: t1
         Engine: InnoDB
        Version: 10
     Row_format: Dynamic
           Rows: 1176484
 Avg_row_length: 86
    Data_length: 101842944
Max_data_length: 0
   Index_length: 0
      Data_free: 39845888
 Auto_increment: NULL
    Create_time: 2018-08-28 13:40:19
    Update_time: 2018-08-28 13:50:43
     Check_time: NULL
      Collation: utf8mb4_general_ci
       Checksum: NULL
 Create_options: 
        Comment: 
1 row in set (0.00 sec)

 

心碎大小 = 数据总大小 – 实际表空间文件大小

  • 数量总大小澳门微尼斯人手机版, = Data_length + Data_length = 101842944

  • 实际表空间文件大小 = rows * Avg_row_length = 1176484 * 86 = 101177624

  • 散装大小 = (101842944 – 101177624) / 1024 /1024 = 0.63MB

通过information_schema.tablesDATA_FREE列查看表有未有碎片:

SELECT t.TABLE_SCHEMA,
       t.TABLE_NAME,
       t.TABLE_ROWS,
       t.DATA_LENGTH,
       t.INDEX_LENGTH,
       concat(round(t.DATA_FREE / 1024 / 1024, 2), 'M') AS datafree
FROM information_schema.tables t
WHERE t.TABLE_SCHEMA = 'employees'


+--------------+--------------+------------+-------------+--------------+----------+
| TABLE_SCHEMA | TABLE_NAME   | TABLE_ROWS | DATA_LENGTH | INDEX_LENGTH | datafree |
+--------------+--------------+------------+-------------+--------------+----------+
| employees    | departments  |          9 |       16384 |        16384 | 0.00M    |
| employees    | dept_emp     |     331143 |    12075008 |     11567104 | 0.00M    |
| employees    | dept_manager |         24 |       16384 |        32768 | 0.00M    |
| employees    | employees    |     299335 |    15220736 |            0 | 0.00M    |
| employees    | salaries     |    2838426 |   100270080 |     36241408 | 5.00M    |
| employees    | t1           |    1191784 |    48824320 |     17317888 | 5.00M    |
| employees    | titles       |     442902 |    20512768 |     11059200 | 0.00M    |
| employees    | ttt          |          2 |       16384 |            0 | 0.00M    |
+--------------+--------------+------------+-------------+--------------+----------+
8 rows in set (0.00 sec)

 

1.依据重复判别标准寻觅重复记录的细小主键(一般是ID列)。

二、MySQL安装

2. 收拾碎片

2.在符合重新条件的笔录中,把主键大于最小主键的笔录整个删掉就可以。

三、安装成功验证

2.1 使用alter table table_name engine = innodb指令举办整理。

 root@localhost [employees] 14:27:01> alter table t1   engine=innodb;

 Query OK, 0 rows affected (5.69 sec)
 Records: 0  Duplicates: 0  Warnings: 0

 root@localhost [employees] 14:27:15> show table status like 't1'\G
 *************************** 1. row ***************************
           Name: t1
         Engine: InnoDB
        Version: 10
     Row_format: Dynamic
           Rows: 1191062
 Avg_row_length: 48
    Data_length: 57229312
Max_data_length: 0
   Index_length: 0
      Data_free: 2097152
 Auto_increment: NULL
    Create_time: 2018-08-28 14:27:15
    Update_time: NULL
     Check_time: NULL
      Collation: utf8mb4_general_ci
       Checksum: NULL
 Create_options: 
        Comment: 
 1 row in set (0.00 sec)

 

如若作者有如下表,须要删除start_time和end_time都一模一样的重复记录。

四、NavicatforMySQL下载及选用

2.2 使用pt-online-schema-change工具也能打开在线整理表结构,搜集碎片等操作。

 [root@mysqldb1 14:29:29 /root]
 # pt-online-schema-change --alter="ENGINE=innodb" D=employees,t=t1 --execute
 Cannot chunk the original table `employees`.`t1`: There is no good index and the table is oversized. at /opt/percona-toolkit-3.0.11/bin/pt-online-schema-change line 5852.

 

 需表上有主键或唯一索引才能运行

 [root@mysqldb1 14:31:16 /root]
# pt-online-schema-change --alter='engine=innodb' D=employees,t=salaries --execute
No slaves found.  See --recursion-method if host mysqldb1 has slaves.
Not checking slave lag because no slaves were found and --check-slave-lag was not specified.
Operation, tries, wait:
  analyze_table, 10, 1
  copy_rows, 10, 0.25
  create_triggers, 10, 1
  drop_triggers, 10, 1
  swap_tables, 10, 1
  update_foreign_keys, 10, 1
Altering `employees`.`salaries`...
Creating new table...
Created new table employees._salaries_new OK.
Altering new table...
Altered `employees`.`_salaries_new` OK.
2018-08-28T14:37:01 Creating triggers...
2018-08-28T14:37:01 Created triggers OK.
2018-08-28T14:37:01 Copying approximately 2838426 rows...
Copying `employees`.`salaries`:  74% 00:10 remain
2018-08-28T14:37:41 Copied rows OK.
2018-08-28T14:37:41 Analyzing new table...
2018-08-28T14:37:42 Swapping tables...
2018-08-28T14:37:42 Swapped original and new tables OK.
2018-08-28T14:37:42 Dropping old table...
2018-08-28T14:37:42 Dropped old table `employees`.`_salaries_old` OK.
2018-08-28T14:37:42 Dropping triggers...
2018-08-28T14:37:42 Dropped triggers OK.
Successfully altered `employees`.`salaries`.

 

澳门微尼斯人手机版 1

 

2.3 使用optimize table命令,整理碎片。

运行OPTIMIZE TABLE
InnoDB创造一个新的.ibd具备不经常名称的公文,只行使存款和储蓄的莫过于数目所需的上空。优化完毕后,InnoDB删除旧.ibd文件并将其替换为新文件。假若原先的.ibd文件显着增加但实际数目只占其大小的一有个别,则运维OPTIMIZE
TABLE能够回收未利用的上空。

mysql>optimize table account;
+--------------+----------+----------+-------------------------------------------------------------------+
| Table        | Op       | Msg_type | Msg_text                                                          |
+--------------+----------+----------+-------------------------------------------------------------------+
| test.account | optimize | note     | Table does not support optimize, doing recreate + analyze instead |
| test.account | optimize | status   | OK                                                                |
+--------------+----------+----------+-------------------------------------------------------------------+
2 rows in set (0.09 sec)

 

那么存款和储蓄进程如下:

一、MySQL下载

3.疏理表碎片shell脚本

# cat optimize_table.sh

#!/bin/sh
socket=/tmp/mysql3306.sock
time=`date +”%Y-%m-%d”`
SQL=”select concat(d.TABLE_SCHEMA,’.’,d.TABLE_NAME) from
information_schema.TABLES d where d.TABLE_SCHEMA = ’employees'”

optimize_table_name=$(/usr/local/mysql/bin/mysql -S $socket -e
“$SQL”|grep -v “TABLE_NAME”)

echo “Begin Optimize Table at: “`date +”%Y-%m-%d
%H:%M:%S”`>/tmp/optimize_table_$time.log

for table_list in $optimize_table_name
do

echo `date +”%Y-%m-%d %H:%M:%S”` “alter table $table_list
engine=innodb …”>>/tmp/optimize_table_$time.log
/usr/local/mysql/bin/mysql -S $socket -e “alter table $table_list
engine=innoDB”

done
echo “End Optimize Table at: “`date +”%Y-%m-%d
%H:%M:%S”`>>/tmp/optimize_table_$time.log

输出内容

# cat optimize_table_2018-08-30.log

Begin Optimize Table at: 2018-08-30 08:43:21
2018-08-30 08:43:21 alter table employees.departments engine=innodb

2018-08-30 08:43:21 alter table employees.dept_emp engine=innodb …
2018-08-30 08:43:27 alter table employees.dept_manager engine=innodb

2018-08-30 08:43:27 alter table employees.employees engine=innodb …
2018-08-30 08:43:32 alter table employees.salaries engine=innodb …
2018-08-30 08:44:02 alter table employees.t1 engine=innodb …
2018-08-30 08:44:17 alter table employees.titles engine=innodb …
2018-08-30 08:44:28 alter table employees.ttt engine=innodb …
End Optimize Table at: 2018-08-30 08:44:28

 

 

DELIMITER //
DROP PROCEDURE IF EXISTS Del_Dup_FOR_TEST;
CREATE PROCEDURE Del_Dup_FOR_TEST()
BEGIN
DECLARE min_id INT;
DECLARE v_start_time,v_end_time DATETIME;
DECLARE v_count INT;
DECLARE done INT DEFAULT 0;
DECLARE my_cur CURSOR FOR SELECT start_time,end_time,min(id),count(1) AS count FROM leo.test GROUP BY start_time,end_time HAVING count>1;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
OPEN my_cur;
  myloop: LOOP
  FETCH my_cur INTO v_start_time,v_end_time,min_id,v_count;
  IF done=1 THEN
  LEAVE myloop;
  END IF;
  DELETE FROM leo.test WHERE start_time=v_start_time AND end_time=v_end_time AND id>min_id;
  COMMIT;
  END LOOP myloop;
CLOSE my_cur;
END;
//
DELIMITER ;

 

逻辑很清楚,便是凭仗重复测量楷模依次删掉重复组中主键大于最小主键的笔录们。

  MySQL版本:5.7.17

不过在编写制定进度中却遇上三个很恶心的BUG,笔者最先的剧情是那样写的:

  下载地址:https://dev.mysql.com/downloads/mysql/

DELIMITER //
DROP PROCEDURE IF EXISTS Del_Dup_FOR_TEST;
CREATE PROCEDURE Del_Dup_FOR_TEST()
BEGIN
DECLARE min_id INT;
DECLARE start_time,end_time DATETIME;
DECLARE count INT;
DECLARE done INT DEFAULT 0;
DECLARE my_cur CURSOR FOR SELECT start_time,end_time,min(id),count(1) AS count FROM leo.test GROUP BY start_time,end_time HAVING count>1;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
OPEN my_cur;
  myloop: LOOP
  FETCH my_cur INTO start_time,end_time,min_id,count;
  IF done=1 THEN
  LEAVE myloop;
  END IF;
  DELETE FROM leo.test WHERE start_time=start_time AND end_time=end_time AND id>min_id;
  COMMIT;
  END LOOP myloop;
CLOSE my_cur;
END;
//
DELIMITER ;

  顾客端工具:NavicatforMySQL

昨今不相同的有的在于变量定义的称号,即:

  品蓝版下载地址:

FETCH
INTO的变量名相对无法是你定义CUQX56SORAV4时SQL语句查出来的列名只怕列小名,也就说您定义的变量名既无法是表中已经存在的列名,也不可能是您定义游标时用过的小名(如本例中的count),只要三个准绳不切合,FETCH
INTO就把全体的变量赋NULL值,那点你能够品尝在FETCH
INTO后加一句Select打字与印刷变量名验证。

  澳门微尼斯人手机版 2

在询问到那些BUG以前去官方网址页面特地看了一下是不是是作者的语法有错误: ,确信语法没难题,但尾数第二条商议呈现大概是列名的遮掩BUG,最终一条切磋理论了BUG说法,但并未有主意自身依旧依照BUG
REPORT做了以上修改,然后功能就见惯司空了。

 

至于此BUG的BUG报告页面详见MySQL BUG:#28227 和
BUG:#5967

  澳门微尼斯人手机版 3

那么再回头看一下官方网址文书档案下的尾声一条切磋,起始笔者感觉最后一条反驳BUG的褒贬完全部都以聊天,是哪位傻X说那不是个BUG的?后来精心想了想,他俩都对,那真的也算个BUG,傻X的也是本人。

 二、MySQL安装

贴一下页面下最终两条商酌(截至2018.08.01):

 安装条件:

Posted by Brent Roady on May 9, 2012
It should be noted that the local variable names used in FETCH [cursor] INTO must be different than the variable names used in the SELECT statement 
defining the CURSOR. Otherwise the values will be NULL. 
In this example, 
DECLARE a VARCHAR(255);
DECLARE cur1 CURSOR FOR SELECT a FROM table1;
FETCH cur1 INTO a;
the value of a after the FETCH will be NULL.
This is also described here: http://bugs.mysql.com/bug.php?id=28227

Posted by Jérémi Lassausaie on February 3, 2015
Answer for Brent Roady :
I don't see any bug in the bahaviour described.
DECLARE a VARCHAR(255);
/* you declare a variable "a" without a specified default value, a=NULL */
DECLARE cur1 CURSOR FOR 
SELECT a FROM table1;
/* You declare a cursor that selects "a" FROM a table */
OPEN cur1;
/* You execute your cursor query, a warning is raised because a is ambiguously defined but you don't see it */
FETCH cur1 INTO a;
/* you put your unique field in your unique row into a (basically you do "SET a=a;") so a is still NULL */
There is no bug report, just a misunderstanding.

  1).net
framework4.0(下载地址:)

Brent遭逢的气象与笔者同样,并列出了BUG
Report的链接。

  假使Windows Server 二〇〇一 在安装.net framework4.0安装进度中报错: net
framework 4.0装置时提示爆发阻止难点:运营安装程序前,必得安装 32 位
Windows 印象管理组件WIC

耶雷米(估算大概是个程序猿)回答,那是八个明了的误会,当你注脚了变量a(最先值为NULL),然后FETCH
INTO a就也就是set a=a,在另外程序语言中那都是无解的。

  请到微软官下载相应的公文安装就可以:

所以在编制存款和储蓄进程中为定义的变量加个前缀标志是很好的习贯,想起在此以前Oracle写存款和储蓄进度着实都加v_前缀,SQL
Server 都用@前缀,以往轮到mysql却忽略了,确实供给记住下。

  32位:

  64位:

 

1、MySQL下载后的公文名称叫:mysql_installer_community_V5.6.21.1_setup.1418020972.msi,暗示图如下:

澳门微尼斯人手机版 4

2、双击后,弹出如下窗口:(若是系统有提醒,采用允许)

 澳门微尼斯人手机版 5

3、安装起来分界面

澳门微尼斯人手机版 6

4、勾选 I accept the license terms,如下图:

澳门微尼斯人手机版 7

5、选用下一步,弹出如下窗口:

澳门微尼斯人手机版 8

 

6、点击下一步,步入谋算安装分界面

澳门微尼斯人手机版 9

7、点击试行,安装

澳门微尼斯人手机版 10

发表评论

电子邮件地址不会被公开。 必填项已用*标注