Reputation: 4721
I have a table with four fields: ID auto increment, a string, and two integers. I want to do something of the sort:
select count(*) from table group by string
and then use the result to consolidate all counts which are larger than 1.
That is, take all rows which have count larger than 1, and replace all of these rows in the database (which have the same string) with a single row, ID does not matter, and the two integers are the sum over all rows of all of the rows with count larger than 1.
Is that possible using a few simple queries?
Thanks.
Upvotes: 1
Views: 157
Reputation: 50970
You can derive this information in a VIEW:
CREATE VIEW SummarizedData (StringCol, IntCol1, IntCol2, OriginalRowCount) AS
SELECT StringCol, SUM(IntCol1), SUM(IntCol2), COUNT(*)
FROM TableName
GROUP BY StringCol
This will create a virtual table with the information you want. It will include the rows for which there was only one instance of StringCol values as well — if you really don't want those add the phrase HAVING COUNT(*) > 1
to the end of the query.
With this method you can maintain the original table and just read from the summarized data or you can create an empty table structure with the appropriate columns and INSERT
from SummarizedData
into your new table to get a "real" table with the data.
Upvotes: 0
Reputation: 12721
You can do it all in a two queries, no temp tables. But you need to run the DELETE query repeatedly since it will only delete 1 duplicate at a time. So if there are 3 copies of a row, you would need to run it twice. But you can just run it until there are no more results.
Update the duplicate rows you are going to keep to contain the count/sum.
UPDATE tablename JOIN (
SELECT min(id) id,sum(int1) int1,sum(int2) int2
FROM tablename GROUP BY string HAVING c>1
) AS dups ON tablename.id=dups.id
SET tablename.int1=dups.int1, tablename.int2
Then you can use that same SELECT query in a DELETE query, using the multiple-table syntax.
DELETE tablename FROM tablename
JOIN (SELECT max(id) AS id,count(*) c FROM tablename GROUP BY string HAVING c>1) dups
ON tablename.id=dups.id
Just run that DELETE until there are no rows returned (0 affected rows).
Upvotes: 0
Reputation: 5650
If you can stop the table from being updated by other users, then it's pretty easy.
-- We're going to add records before deleting old ones, so keep track of which records are old.
DECLARE @OldMaxID INT
SELECT @OldMaxID = MAX(ID) FROM table
-- Combine duplicate records into new records
INSERT table (string, int1, int2)
SELECT string, SUM(int1), SUM(int2)
FROM table
GROUP BY string
HAVING COUNT(*) > 1
-- Delete records that were used to make combined records.
DELETE FROM table
WHERE ID <= @OldMaxID
GROUP BY string
HAVING COUNT(*) > 1
Upvotes: 0
Reputation: 19356
I would suggest to insert into temporary table data grouped by string AND accompanied by min(id) where there are duplicates. Then update original table with sums where id = min(id), and delete where strings match but ids don't.
insert into temp
select string, min(id) id, sum(int1) int1, sum(int2) int2
from table
group by string
having count(*) > 1
update table, temp
set table.int1 = temp.int1,
table.int2 = temp.int2
where table.id = temp.id
-- Works because there is only one record given a string in temp
delete table
where exists (select null from temp where temp.string = table.string and temp.id <> table.id)
Backup is mandatory :-) and a transaction also.
Upvotes: 1
Reputation: 359826
Start by selecting just the ones with count > 0
, and selecting the sums that you want:
select * from (
select count(*), string_col, sum(int_col_1), sum(int_col_2)
from my_table
group by string_col
) as foo where count > 1
After that, I would put that data into a temporary table, delete the rows you don't want, and insert the data from the temp table into the original one.
Upvotes: 0
Reputation: 10529
There's a simple way to do this. Just place something like
id NOT IN (select id from table group by string)
in your where statement, which will select only duplicates
Upvotes: 0