Reputation: 226
Here is my data:
ID FName LName data1 data2
1 John Doe xxx1
2 John Doe xxx2 yyy2
And here is my desired result:
ID FName LName data1 data2
1 John Doe xxx1 yyy2
In short, I have a table where are a lot of people, and that table is filled from multiple sources with different data and IDs. What I want is, for each duplicate I found and for each column in the table view if there is data present in that cell, then, if it exists, try to dump it to the oldest record for that person, if there is data, do nothing.
I don't know if I made myself clear.
What should be the best approach to do this? Should I write a stored procedure or it can be done with a clever query I haven't came up with yet?
Upvotes: 2
Views: 814
Reputation: 28641
You can create a custom aggregate like that:
CREATE FUNCTION remember_first(acc text, newval text) RETURNS text AS $$
BEGIN
RETURN COALESCE(acc, newval);
END;
$$ LANGUAGE plpgsql IMMUTABLE;
CREATE AGGREGATE first(text) (
sfunc = remember_first,
stype = text
);
It will return the first not null
value. Then:
SELECT FName, LName, first(data1), first(data2)
FROM your_table
GROUP BY FName, LName
ORDER BY FName, LName, id -- or your ordering columns
To get the data you need. Last - just use this SELECT
to update the records. Or just create a VIEW
with desired data.
P.S. The aggregate function is from Custom aggregate function
Upvotes: 2
Reputation: 20398
You can solve this with a query using joins and window functions:
select nodups.id, nodups.fname, nodups.lname, d1.data1, d2.data2
from
(select min(id) as id, fname, lname from sample group by fname, lname) nodups
left join
(select fname, lname, min(data1) as data1
from (select fname, lname
, first_value(data1) over (partition by fname, lname order by id) as data1
from sample where data1 is not null) d1x
group by fname, lname
) d1 using (fname, lname)
left join
(select fname, lname, min(data2) as data2
from (select fname, lname
, first_value(data2) over (partition by fname, lname order by id) as data2
from sample where data2 is not null) d2x
group by fname, lname
) d2 using (fname, lname)
order by id
;
Try testing this approach with your real data against Igor's custom aggregate to see which performs better.
Upvotes: 2