Saturday, April 6, 2013

Finding Duplicate Rows

Today I am going to describe the query which fetches duplicate rows.Its not a big task but when I started to think as a beginner I felt an urge to write about it.
This is the table structure I used for the query.Here "SD_Id"is unique where as other fields are repetitive.
CREATE TABLE [dbo].[SaleDispatch](
    [SD_Id] [bigint] IDENTITY(1,1) NOT NULL,
    [SO_Id] [bigint] NULL,
    [Date] [datetime] NULL,
    [TotAmt] [money] NULL,
   ) ON [PRIMARY]


I am going to find out the rows where  [SO_Id] is duplicate. We will use  COUNT() .
So as a beginner one can come up with this query -:
SELECT  [SO_Id], COUNT(*) AS dupeCount
FROM [dbo].[SaleDispatch]


And encountered by this error -:
Msg 8120, Level 16, State 1, Line 1
Column 'dbo.SaleDispatch.SO_Id' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.


This error occurs bcoz we are using COUNT() which is an aggregate function where as select list contains SO_Id which has no any aggregate function.So in this situation we have to use GROUP BY clause.Now the query will be -:
SELECT  [SO_Id], COUNT(*) AS dupeCount
 FROM [dbo].[SaleDispatch]
 GROUP BY [SO_Id]


This query returns all rows and we want duplicate only then we have to put condition.
SELECT  [SO_Id], COUNT(*) AS dupeCount
 FROM [dbo].[SaleDispatch]
 GROUP BY [SO_Id] 
 HAVING COUNT(*) > 1


Now we want other columns also then we have to use join.And now the final query arrives as this one -:
SELECT SD.[SD_Id],SD.[SO_Id],SD.[Date],SD.[TotAmt]
FROM [dbo].[SaleDispatch]  as SD  
inner join

    SELECT  [SO_Id], COUNT(*) AS cnt
    FROM [dbo].[SaleDispatch]
    GROUP BY [SO_Id] 
    HAVING COUNT(*) > 1
) as SDC
on SD.SO_Id=SDC.SO_Id
order by SD.SO_Id