Reputation: 33
I have a database with approximately 10 million rows (and 20 columns - about 4 GB) where about 10% of the rows have a duplicate column. Database is in SQL Server 2014 Express and using SSMS.
I created a new column CNT (int, null) to count the occurrences of each row where I have a duplicate ID. Desired result would look like:
ID CNT
100 1
100 2
101 1
102 1
102 2
103 1
104 1
Not being really familiar with advanced SQL capabilities I did some research and came up with using a CTE to set the CNT column. Worked fine on a small test table - but it was obvious this is not the way to go for a large table (I killed it after 5+ hours on a pretty decent system.)
Here's the code that I attempted to implement:
with CTE as
(select dbo.database.id, dbo.database.cnt,
RN = row_number() over (partition by id order by id)
from dbo.databasee)
update CTE set CNT = RN
Column ID is of type Int. All columns allow nulls - there are no keys or indexed columns.
Upvotes: 3
Views: 1468
Reputation: 2897
Edit: Martin is right, I can only offer an alternate solution than the CTE at the moment. Make a new table exactly like your old one, and insert the old table's data into it with this.
INSERT INTO newTable
SELECT ID, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID)
FROM oldTable;
Then you can delete your old table. Definitely not a perfect solution, but it should work.
Upvotes: 2