raghuveer
raghuveer

Reputation: 113

Postgresql : Removing Duplicates after performing UNION ALL

I have a requirement where i need to remove some rows after Joining two tables using UNION ALL. Here are the Tables

Accounts1

id username department salary
1 Sam IT 2000
2 Frodo Accounts 1000
3 Natan Service 800
4 Kenworth Admin 900

Accounts2

id username department salary
5 Sam IT 1600
6 Frodo Accounts 800

Expected Result of the UNION should be

id username department salary
5 Sam IT 1600
6 Frodo Accounts 800
3 Natan Service 800
4 Kenworth Admin 900

As seen the expected result should contain the records of the least salary from the accounts2 table replacing the records from the accounts1. I have tried with Distinct but that doesnot resolve the requirement. Any help is greatly appreciated

Upvotes: 0

Views: 1380

Answers (2)

Chris Maurer
Chris Maurer

Reputation: 2539

After UNIONing the two sets, I would calculate a Row_number() ON (Group By department, username Order By salary, id). Then I would wrap that one in one more Select to filter and retain only row_number = 1.

A little more code, but very explicit as to what is being performed and it has the advantage that if either data set happens to contain multiple values for a user you still get the one with the lowest salary.

This is a problem that comes up often where there are multiple records within a group domain and you want to choose "the best" one even if you can't say exactly which one that is. The row_number() window function allows your Order By to make the best choice float to the top where it will assign the row_number of 1. You can then filter and retain only the row_numbers=1 as the "best" choice within each domain. This always means at least two Select statements because window functions are evaluated after Where and Having clauses.

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269493

You can use union all with filtering:

select a2.*
from accounts2 a2
union all
select a1.*
from accounts1 a1
where not exists (select 1
                  from accounts2 a2
                  where a2.username = a1.username and a2.department = a1.department
                 );

EDIT:

If you want one row per username or username/department from either table with the minimum salary, then I would suggest union all with distinct on:

select distinct on (username, department) a.*
from ((select a1.*
       from accounts a1
      ) union all
      (select a2.*
       from accounts a2
      )
     ) a
order by username, department, salary;

Remove department accordingly if you want one row per employee.

Upvotes: 2

Related Questions