Reputation: 396
I want to create a table by joining multiple source tables.
This table should have new entries or updates in last 24 hours from the source tables.
I will insert the new data and delete the updated data and reinsert it.
The problem is that the source tables are not updated at the same time.
What is the best way to keep this table up to date with the data from source tables.
Following example code will work if table 'a' is updated but what if table 'b' and 'c' are updated later how can I update my new table as well to get the updated fields from those tables?
I am using snowflake database.
insert into combined_table
select a.id, max(b.shipment_date), b.quantity, c.status
from table_a a
left join table_b b on a.id=b.a_id
left join table_c c on b.id=c.b_id
a.record_updated_at > dateadd(HOUR, -24, CURRENT_TIMESTAMP)
group by a.id, c.status
table_a
id created_at updated_at
1 2019-02-14 2019-02-16
table_b
id a_id shipment_date quantity created_at updated_at
3 1 2019-02-15 5 2019-02-15 2019-02-16
table_c
id b_id status created_at updated_at
5 3 Inactive 2019-02-15 2019-02-15
combined_table
id shipment_date quantity status
1 2019-02-15 5 Inactive
if for example table_b quantity changes from 5 to 7 and in table_c status change to 'Active' how can I update this in my delta table?
table_b
id a_id shipment_date quantity created_at updated_at
3 1 2019-02-15 5 2019-02-15 2019-02-16
table_c
id b_id status created_at updated_at
5 3 Active 2019-02-15 2019-02-16
Combined table should look like following. What is the best way?
combined_table
id shipment_date quantity status
1 2019-02-15 7 Active
Upvotes: 1
Views: 747
Reputation: 6279
From what I understand of your problem I think you may want to change those joins to full join
s. This will cover the case where an ID may exist in b
, c
or d
but doesn't yet exist in a
. After that you can use a where
predicate with a bunch of or
statements to check each table for changes.
Here is a simplified example of what I think you need:
-- set up the example tables
create or replace temporary table table_a (id number, record_updated_at timestamp_ntz);
create or replace temporary table table_b (id number, shipment_date date, record_updated_at timestamp_ntz);
create or replace temporary table table_c (id number, status varchar, record_updated_at timestamp_ntz);
-- add some sample data
insert overwrite into table_a values (1, '2019-01-01T01:00:00'), (2, '2019-01-01T04:00:00');
insert overwrite into table_b values (1, '2019-01-01','2019-01-01T01:00:00'), (3, '2019-01-02','2019-01-01T03:00:00');
insert overwrite into table_c values (1, 'shipped','2019-01-01T01:00:00');
-- return any records that have changed in any table
select
a.id a_id,
a.record_updated_at a_updated,
b.id b_id,
b.record_updated_at b_updated,
b.shipment_date,
c.id c_id,
c.status,
c.record_updated_at c_updated
from table_a a
full join table_b b on a.id = b.id
full join table_c c on a.id = c.id
where a.record_updated_at > '2019-01-01T02:00:00'
or b.record_updated_at > '2019-01-01T02:00:00'
or c.record_updated_at > '2019-01-01T02:00:00'
Alternatively you can do what Mike has mentioned and used tasks + streams. This is a pretty neat way of doing it too.
Upvotes: 0
Reputation: 7369
I'd take a look at Tasks and Streams in Snowflake. These would allow you to setup a stream on each of your tables to capture changes that have occurred there, and then run a task against those streams on a scheduled basis IF there are changes available.
Upvotes: 3