Dito Khelaia
Dito Khelaia

Reputation: 138

How to merge duplicate records in MySQL

I have post_view_counters table with 11 million row.

id    post_id    start_date    end_date    views
_________________________________________________
1       55          XXXX         YYYY        90
2       55          XXXX         YYYY        1
3       55          XXXX         YYYY        1

there are duplicate records for some reason (bug in backend) which already fixed. I need merge all rows views with same post_id, start_date and end_date result should be like this after update table

id    post_id    start_date    end_date    views
_________________________________________________
1       55          XXXX         YYYY        92

Upvotes: 1

Views: 334

Answers (4)

forpas
forpas

Reputation: 164099

First you must update the rows with the min id for each post_id, start_date, end_date:

update tablename t inner join (
  select sum(views) views, min(id) id from tablename
  group by post_id, start_date, end_date
) tt
on tt.id = t.id 
set t.views = tt.views;

and then delete all the other ids keeping only the min id:

delete t 
from tablename t inner join tablename tt
on tt.post_id = t.post_id 
and tt.start_date = t.start_date and tt.end_date = t.end_date 
and t.id > tt.id;

Since this a large table, proper indexes are needed to make the process run as fast as possible.

See the demo.
For this table:

CREATE TABLE tablename (
  `id` INTEGER,
  `post_id` INTEGER,
  `start_date` VARCHAR(4),
  `end_date` VARCHAR(4),
  `views` INTEGER
);

INSERT INTO tablename
  (`id`, `post_id`, `start_date`, `end_date`, `views`)
VALUES
  ('1', '55', 'XXXX', 'YYYY', '90'),
  ('2', '55', 'XXXX', 'YYYY', '1'),
  ('3', '55', 'XXXX', 'YYYY', '1'),
  ('4', '65', 'AAAA', 'BBBB', '10'),
  ('5', '65', 'AAAA', 'BBBB', '2'),
  ('6', '65', 'AXXX', 'BYYY', '100'),
  ('7', '65', 'AXXX', 'BYYY', '200'),
  ('8', '75', 'CCCC', 'CCCC', '1');

Results:

| id  | post_id | start_date | end_date | views |
| --- | ------- | ---------- | -------- | ----- |
| 1   | 55      | XXXX       | YYYY     | 92    |
| 4   | 65      | AAAA       | BBBB     | 12    |
| 6   | 65      | AXXX       | BYYY     | 300   |
| 8   | 75      | CCCC       | CCCC     | 1     |

Upvotes: 0

FanoFN
FanoFN

Reputation: 7114

I would do the safe approach:

First, create a new table -

CREATE TABLE post_view_counters_new LIKE post_view_counters;

Then insert the data into the new table-(syntax by @scaisEdge)

INSERT INTO post_view_counters_new 
SELECT MIN(id) id,  post_id ,   start_date ,   end_date ,   SUM( views) views
FROM post_view_counters
GROUP  BY  post_id ,   start_date ,   end_date; 

After that, compare data between new table and old table. Once you're satisfied, rename old table to something like 'post_view_counters_old' and rename 'post_view_counters_new' to 'post_view_counters'. So, in case you're missing anything inside the new table, you still can refer back the old table.

Upvotes: 3

Fahmi
Fahmi

Reputation: 37473

You can try below -

select min(id),post_id,    start_date ,   end_date,sum(views)
from tablename
group by post_id, start_date, end_date

Upvotes: 0

ScaisEdge
ScaisEdge

Reputation: 133370

for the same post_id , start_date , end_date you could use aggregation function as min(id) , sum(view ) and group by

select min(id) id,  post_id ,   start_date ,   end_date ,   sum( views) views
from my_table  
group  by  post_id ,   start_date ,   end_date

Upvotes: 0

Related Questions