Consolidate rows in PostgreSQL

Question

Here is my data:

ID      FName   LName   data1   data2
1       John    Doe     xxx1    
2       John    Doe     xxx2    yyy2

And here is my desired result:

ID      FName   LName   data1   data2
1       John    Doe     xxx1    yyy2

In short, I have a table where are a lot of people, and that table is filled from multiple sources with different data and IDs. What I want is, for each duplicate I found and for each column in the table view if there is data present in that cell, then, if it exists, try to dump it to the oldest record for that person, if there is data, do nothing.

I don't know if I made myself clear.

What should be the best approach to do this? Should I write a stored procedure or it can be done with a clever query I haven't came up with yet?

dbenhur · Accepted Answer

You can solve this with a query using joins and window functions:

select nodups.id, nodups.fname, nodups.lname, d1.data1, d2.data2
from
  (select min(id) as id, fname, lname from sample group by fname, lname) nodups
left join
  (select fname, lname, min(data1) as data1
   from (select fname, lname
           , first_value(data1) over (partition by fname, lname order by id) as data1
         from sample where data1 is not null) d1x
   group by fname, lname
  ) d1 using (fname, lname)
left join
  (select fname, lname, min(data2) as data2
   from (select fname, lname
           , first_value(data2) over (partition by fname, lname order by id) as data2
         from sample where data2 is not null) d2x
   group by fname, lname
  ) d2 using (fname, lname)
order by id
;

SQLFiddle

Try testing this approach with your real data against Igor's custom aggregate to see which performs better.

Consolidate rows in PostgreSQL

Answers (2)

Related Questions