文章推薦

DNF每日簽到送豪

lol7月神秘商店

LOL黑市亂斗怎么

LOL英雄成就標志

本類常用軟件

sql刪除重復數(shù)據(jù)的詳細方法

2013/5/27 22:26:14 出處：本站原創(chuàng) 人氣：264次字號：小中大

騎自行車的正確

在校大學生該如

微信朋友圈怎么

每日一囧

重復數(shù)據(jù)，通常有兩種：一是完全重復的記錄，也就是所有字段的值都一樣；二是部分字段值重復的記錄。

一. 刪除完全重復的記錄

完全重復的數(shù)據(jù)，通常是由于沒有設置主鍵/唯一鍵約束導致的。

測試數(shù)據(jù)：

if OBJECT_ID('duplicate_all') is not null

drop table duplicate_all

create table duplicate_all

(

c1 int,

c2 int,

c3 varchar(100)

)

insert into duplicate_all

select 1,100,'aaa' union all

select 2,200,'bbb' union all

select 3,300,'ccc' union all

select 4,400,'ddd' union all

select 5,500,'eee'

(1) 借助臨時表

利用DISTINCT得到單條記錄，刪除源數(shù)據(jù)，然后導回不重復記錄。

如果表不大的話，可以把所有記錄導出一次，然后truncate表后再導回，這樣可以避免delete的日志操作。

if OBJECT_ID('tempdb..#tmp') is not null

drop table #tmp

select distinct * into #tmp

from duplicate_all

where c1 = 1

delete duplicate_all where c1 = 1

insert into duplicate_all

select * from #tmp

(2) 使用ROW_NUMBER

with tmp

(

select *,ROW_NUMBER() OVER(PARTITION BY c1,c2,c3 ORDER BY(getdate())) as num

from duplicate_all

where c1 = 1

)

delete tmp where num > 1

如果多個表有完全重復的行，可以考慮通過UNION將多個表聯(lián)合，插到一個新的同結(jié)構(gòu)的表，SQL Server會幫助去掉表和表之間的重復行。

二. 刪除部分重復的記錄

部分列重復的數(shù)據(jù)，通常表上是有主鍵的，可能是程序邏輯造成了多行數(shù)據(jù)列值的重復。

測試數(shù)據(jù)：

if OBJECT_ID('duplicate_col') is not null

drop table duplicate_col

create table duplicate_col

(

c1 int primary key,

c2 int,

c3 varchar(100)

)

insert into duplicate_col

select 1,100,'aaa' union all

select 2,100,'aaa' union all

select 3,100,'aaa' union all

select 4,100,'aaa' union all

select 5,500,'eee'

(1) 唯一索引

唯一索引有個忽略重復建的選項，在創(chuàng)建主鍵約束/唯一鍵約束時都可以使用這個索引選項。

if OBJECT_ID('tmp') is not null

drop table tmp

create table tmp

(

c1 int,

c2 int,

c3 varchar(100),

constraint UQ_01 unique(c2,c3) with(IGNORE_DUP_KEY = ON)

)

insert into tmp

select * from duplicate_col

select * from tmp

(2) 借助主鍵/唯一鍵來刪除

通常會選擇主鍵/唯一鍵的最大/最小值保留，其他行刪除。以下只保留重復記錄中c1最小的行。

delete from duplicate_col

where exists(select 1 from duplicate_col b where duplicate_col.c1 > b.c1 and (duplicate_col.c2 = b.c2 and duplicate_col.c3 = b.c3))

--或者

delete from duplicate_col

where c1 not in (select min(c1) from duplicate_col group by c2,c3)

如果要保留重復記錄中的第N行，可以參考05.取分組中的某幾行。

(3) ROW_NUMBER

和刪除完全重復記錄的寫法基本一樣。

with tmp

(

select *,ROW_NUMBER() OVER(PARTITION BY c2,c3 ORDER BY(getdate())) as num

from duplicate_col

)

delete tmp where num > 1

select * from duplicate_col

熱門評論

文章推薦

相關(guān)資訊

本類常用軟件

sql刪除重復數(shù)據(jù)的詳細方法