Reputation: 33
Currently I have a problem with SQL. I am working with SQL Server 2014.
Some background: I uploaded lots of weather data from .csv
files into a table named dbo.import
. Because the data is not adjusted/corrected so I chose mostly "text" as data type.
I have many duplicate rows identified and I wrote them in a new table named dbo.duplikate
INSERT INTO dbo.duplikate
SELECT
airportCode, CAST(DateUTC AS VARCHAR(25)), Count(*)
FROM
dbo.import
GROUP BY
airportCode, CAST(DateUTC AS VARCHAR(25))
HAVING
Count(*) > 1
(Maybe there is a way available how I can write all rows in the new table through a similar statement so I don't need a cursor(?)?)
Now I try to write all duplicates, with all rows (in total 16), in a new table.
My idea was iterate through my table dbo.duplikate
row for row and compare this in my WHERE
clause.
Something like this (pseudo-code):
INSERT INTO dbo.newTable
SELECT *
FROM dbo.import
WHERE dbo.import.DateUTC LIKE dbo.duplikate.DateUTC[i]
AND dbo.import.airportCode LIKE dbo.duplikate.airportCode[i]
Through google I find cursors. But I am not sure if this is the right way. Every try failed because I don't know how to assign the rows to my WHERE clause ...
Regards, Julian
Upvotes: 1
Views: 442
Reputation: 34774
There's nothing in your question that makes me believe you need a cursor. Cursors can be very useful, but performance is almost always better when you perform actions on sets of data rather than looping.
In SQL Server you can use a ROW_NUMBER()
or a windowed COUNT()
to identify duplicates without losing the detail as you do in your query:
;with cte AS (SELECT *,ROW_NUMBER() OVER(PARTITION BY airportCode, CAST(DateUTC AS VARCHAR(25)) ORDER BY airportCode) RN
,COUNT(*) OVER(PARTITION BY airportCode, CAST(DateUTC AS VARCHAR(25))) Dup_CT
FROM dbo.import)
SELECT *
FROM cte
From there you can add in WHERE
criteria to suit your needs, for example if you just want to put all records that have a duplicate into a new table you'd use WHERE Dup_CT > 1
. You can also DELETE
from the cte if you want to remove duplicate records:
;with cte AS (SELECT *,ROW_NUMBER() OVER(PARTITION BY airportCode, CAST(DateUTC AS VARCHAR(25)) ORDER BY airportCode) RN
,COUNT(*) OVER(PARTITION BY airportCode, CAST(DateUTC AS VARCHAR(25))) Dup_CT
FROM dbo.import)
DELETE FROM cte
WHERE RN > 1
You can adjust the ORDER BY
clause in the ROW_NUMBER()
function to specify which record you'll keep if performing the above DELETE
.
Upvotes: 1