I have a requirement where i need to remove some rows after Joining two tables using UNION ALL. Here are the Tables Accounts1 id username department salary 1 Sam IT 2000 2 Frodo Accounts 1000 3 Natan Service 800 4 Kenworth Admin 900 Accounts2 id username department salary 5 Sam IT 1600 6 Frodo Accounts 800 Expected Result of the UNION should be id username department salary 5 Sam IT 1600 6 Frodo Accounts 800 3 Natan Service 800 4 Kenworth Admin 900 As seen the expected result should contain the records of the least salary from the accounts2 table replacing the records from the accounts1. I have tried with Distinct but that doesnot resolve the requirement. Any help is greatly appreciated

Reputation: 113

Postgresql : Removing Duplicates after performing UNION ALL

I have a requirement where i need to remove some rows after Joining two tables using UNION ALL. Here are the Tables

Accounts1

id	username	department	salary
1	Sam	IT	2000
2	Frodo	Accounts	1000
3	Natan	Service	800
4	Kenworth	Admin	900

Accounts2

id	username	department	salary
5	Sam	IT	1600
6	Frodo	Accounts	800

Expected Result of the UNION should be

id	username	department	salary
5	Sam	IT	1600
6	Frodo	Accounts	800
3	Natan	Service	800
4	Kenworth	Admin	900

As seen the expected result should contain the records of the least salary from the accounts2 table replacing the records from the accounts1. I have tried with Distinct but that doesnot resolve the requirement. Any help is greatly appreciated

Upvotes: 0

Answers (2)

Chris Maurer

Reputation: 2539

After UNIONing the two sets, I would calculate a Row_number() ON (Group By department, username Order By salary, id). Then I would wrap that one in one more Select to filter and retain only row_number = 1.

A little more code, but very explicit as to what is being performed and it has the advantage that if either data set happens to contain multiple values for a user you still get the one with the lowest salary.

This is a problem that comes up often where there are multiple records within a group domain and you want to choose "the best" one even if you can't say exactly which one that is. The row_number() window function allows your Order By to make the best choice float to the top where it will assign the row_number of 1. You can then filter and retain only the row_numbers=1 as the "best" choice within each domain. This always means at least two Select statements because window functions are evaluated after Where and Having clauses.

Upvotes: 0

Gordon Linoff

Reputation: 1269493

You can use union all with filtering:

select a2.*
from accounts2 a2
union all
select a1.*
from accounts1 a1
where not exists (select 1
                  from accounts2 a2
                  where a2.username = a1.username and a2.department = a1.department
                 );

EDIT:

If you want one row per username or username/department from either table with the minimum salary, then I would suggest union all with distinct on:

select distinct on (username, department) a.*
from ((select a1.*
       from accounts a1
      ) union all
      (select a2.*
       from accounts a2
      )
     ) a
order by username, department, salary;

Remove department accordingly if you want one row per employee.

Upvotes: 2

Postgresql : Removing Duplicates after performing UNION ALL

Answers (2)

Related Questions