Delete duplicate records from a SQL Server table

Today one of my colleague asked me how to delete duplicate records from a SQL Server Table. So to answer his question, let’s check the following example.

Note: The below example works on SQL Server 2005 and later versions.

–Create Table

CREATE TABLE dbo.TestDuplicate (EmpNo INT, EmpName VARCHAR(10));

–Insert Test Data

INSERT INTO dbo.TestDuplicate VALUES (1,‘Emp 1’);

INSERT INTO dbo.TestDuplicate VALUES (1,‘Emp 1’);

INSERT INTO dbo.TestDuplicate VALUES (2,‘Emp 2’);

INSERT INTO dbo.TestDuplicate VALUES (2,‘Emp 2’);

INSERT INTO dbo.TestDuplicate VALUES (3,‘Emp 3’);

–Remove Duplicates

WITH CTE
AS (SELECT ROW_NUMBER () OVER (ORDER BY EmpNo) SrNo , EmpNo FROM dbo.TestDuplicate)
DELETE FROM CTE WHERE SrNo NOT IN (SELECT MAX(SrNo) FROM CTE GROUP BY EmpNo);

–Check the table

SELECT * FROM dbo.TestDuplicate;

— Data before deleting the duplicate records.

image

— Data after deleting the duplicate records.

image

Points to Note:

  1. This was possible because of the ROW_NUMBER() function available in SQL 2005 onwards.
  2. ROW_NUMBER() function Returns the sequential number of a row within a partition of a result set. This means for each row in a result set it will return one unique sequential number for that row.
  3. The OVER clause determines the partitioning and ordering of the row set.
  4. As we have used CTE (Common Table Expression) it allows us to reuse the result set like a temporary table.

Hope this article may help you in deleting the duplicate records.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s