Reputation: 138
I have post_view_counters table with 11 million row.
id post_id start_date end_date views
_________________________________________________
1 55 XXXX YYYY 90
2 55 XXXX YYYY 1
3 55 XXXX YYYY 1
there are duplicate records for some reason (bug in backend) which already fixed. I need merge all rows views with same post_id, start_date and end_date result should be like this after update table
id post_id start_date end_date views
_________________________________________________
1 55 XXXX YYYY 92
Upvotes: 1
Views: 334
Reputation: 164099
First you must update the rows with the min id
for each post_id, start_date, end_date
:
update tablename t inner join (
select sum(views) views, min(id) id from tablename
group by post_id, start_date, end_date
) tt
on tt.id = t.id
set t.views = tt.views;
and then delete all the other ids keeping only the min id
:
delete t
from tablename t inner join tablename tt
on tt.post_id = t.post_id
and tt.start_date = t.start_date and tt.end_date = t.end_date
and t.id > tt.id;
Since this a large table, proper indexes are needed to make the process run as fast as possible.
See the demo.
For this table:
CREATE TABLE tablename (
`id` INTEGER,
`post_id` INTEGER,
`start_date` VARCHAR(4),
`end_date` VARCHAR(4),
`views` INTEGER
);
INSERT INTO tablename
(`id`, `post_id`, `start_date`, `end_date`, `views`)
VALUES
('1', '55', 'XXXX', 'YYYY', '90'),
('2', '55', 'XXXX', 'YYYY', '1'),
('3', '55', 'XXXX', 'YYYY', '1'),
('4', '65', 'AAAA', 'BBBB', '10'),
('5', '65', 'AAAA', 'BBBB', '2'),
('6', '65', 'AXXX', 'BYYY', '100'),
('7', '65', 'AXXX', 'BYYY', '200'),
('8', '75', 'CCCC', 'CCCC', '1');
Results:
| id | post_id | start_date | end_date | views |
| --- | ------- | ---------- | -------- | ----- |
| 1 | 55 | XXXX | YYYY | 92 |
| 4 | 65 | AAAA | BBBB | 12 |
| 6 | 65 | AXXX | BYYY | 300 |
| 8 | 75 | CCCC | CCCC | 1 |
Upvotes: 0
Reputation: 7114
I would do the safe approach:
First, create a new table -
CREATE TABLE post_view_counters_new LIKE post_view_counters;
Then insert the data into the new table-(syntax by @scaisEdge)
INSERT INTO post_view_counters_new
SELECT MIN(id) id, post_id , start_date , end_date , SUM( views) views
FROM post_view_counters
GROUP BY post_id , start_date , end_date;
After that, compare data between new table and old table. Once you're satisfied, rename old table to something like 'post_view_counters_old' and rename 'post_view_counters_new' to 'post_view_counters'. So, in case you're missing anything inside the new table, you still can refer back the old table.
Upvotes: 3
Reputation: 37473
You can try below -
select min(id),post_id, start_date , end_date,sum(views)
from tablename
group by post_id, start_date, end_date
Upvotes: 0
Reputation: 133370
for the same post_id , start_date , end_date you could use aggregation function as min(id) , sum(view ) and group by
select min(id) id, post_id , start_date , end_date , sum( views) views
from my_table
group by post_id , start_date , end_date
Upvotes: 0