Chuck
Chuck

Reputation: 1293

how to join tables by comparing two fields, and also considering performance

This should be simple but I can't get it. I need to do a select to get newer date values for some accounts.

I start here, T1:

+----------+---------+
|  date   | account |
+----------+---------+
| 4/1/2018 |       1 |
| 4/1/2018 |       2 |
| 4/1/2018 |       3 |
| 4/1/2018 |       4 |
| 4/1/2018 |       5 |
+----------+---------+

Then some dates are updated in T2:

+----------+---------+
|   date   | account |
+----------+---------+
| 7/1/2018 |       1 |
| 7/1/2018 |       2 |
+----------+---------+

How can I get this output into T3, updating just those accounts?

+----------+---------+
|   date   | account |
+----------+---------+
| 7/1/2018 |       1 |
| 7/1/2018 |       2 |
| 4/1/2018 |       3 |
| 4/1/2018 |       4 |
| 4/1/2018 |       5 |
+----------+---------+

I can do a join on the account number, but what about the ones that didn't change? How to capture those?

Also, T1 has about 8 million records, so performance would be a factor. Extracting from Teradata, loading into Hive.

Thanks!

Upvotes: 2

Views: 155

Answers (3)

vikrant rana
vikrant rana

Reputation: 4679

Just an addition to previously good answers.. try using it with coalesce also and let me know if it improves performance.

select t1.Account, coalesce(t2.Date, t1.Date) 
from t1
left outer join t2
  on t2.Account = t1.Account

Upvotes: 8

Eray Balkanli
Eray Balkanli

Reputation: 7990

Here is another solution with left outer join:

select t1.Account, case when t2.Date is null then t1.Date else t2.Date end
from t1
left outer join t2 on t2.Account = t1.Account

Upvotes: 3

Gordon Linoff
Gordon Linoff

Reputation: 1270583

I think you want:

select t2.*
from t2
union all
select t1.*
from t1
where not exists (select 1 from t2 where t2.account = t1.account);

This selects first from t2. Then it takes remaining accounts from t1.

Upvotes: 3

Related Questions