jmf
jmf

Reputation: 396

How to keep updating a table which is created from multiple tables with different update times

I want to create a table by joining multiple source tables.

This table should have new entries or updates in last 24 hours from the source tables.

I will insert the new data and delete the updated data and reinsert it.

The problem is that the source tables are not updated at the same time.

What is the best way to keep this table up to date with the data from source tables.

Following example code will work if table 'a' is updated but what if table 'b' and 'c' are updated later how can I update my new table as well to get the updated fields from those tables?

I am using snowflake database.

insert into combined_table
select a.id, max(b.shipment_date), b.quantity, c.status 
from table_a a 
left join table_b b on a.id=b.a_id 
left join table_c c on b.id=c.b_id 
a.record_updated_at > dateadd(HOUR, -24, CURRENT_TIMESTAMP)
group by a.id, c.status

table_a
id  created_at   updated_at 
1   2019-02-14   2019-02-16

table_b
id  a_id  shipment_date  quantity created_at  updated_at  
3   1     2019-02-15     5        2019-02-15  2019-02-16

table_c
id   b_id   status    created_at   updated_at
5    3      Inactive  2019-02-15   2019-02-15

combined_table
id shipment_date  quantity status
1  2019-02-15     5        Inactive

if for example table_b quantity changes from 5 to 7 and in table_c status change to 'Active' how can I update this in my delta table?

table_b
id  a_id  shipment_date  quantity created_at  updated_at  
3   1     2019-02-15     5        2019-02-15  2019-02-16

table_c
id   b_id   status    created_at   updated_at
5    3      Active    2019-02-15   2019-02-16

Combined table should look like following. What is the best way?

combined_table
id shipment_date  quantity status
1  2019-02-15     7        Active

Upvotes: 1

Views: 747

Answers (3)

Jeffrey Jacobs
Jeffrey Jacobs

Reputation: 332

Look at new STREAMS ON VIEW feature.

Upvotes: 0

Simon D
Simon D

Reputation: 6279

From what I understand of your problem I think you may want to change those joins to full joins. This will cover the case where an ID may exist in b, c or d but doesn't yet exist in a. After that you can use a where predicate with a bunch of or statements to check each table for changes.

Here is a simplified example of what I think you need:

-- set up the example tables
create or replace temporary table table_a (id number, record_updated_at timestamp_ntz);
create or replace temporary table table_b (id number, shipment_date date, record_updated_at timestamp_ntz);
create or replace temporary table table_c (id number, status varchar, record_updated_at timestamp_ntz);

-- add some sample data
insert overwrite into table_a values (1, '2019-01-01T01:00:00'), (2, '2019-01-01T04:00:00');
insert overwrite into table_b values (1, '2019-01-01','2019-01-01T01:00:00'), (3, '2019-01-02','2019-01-01T03:00:00');
insert overwrite into table_c values (1, 'shipped','2019-01-01T01:00:00');

-- return any records that have changed in any table
select
    a.id a_id, 
    a.record_updated_at a_updated,
    b.id b_id, 
    b.record_updated_at b_updated,
    b.shipment_date,
    c.id c_id,
    c.status,
    c.record_updated_at c_updated
from table_a a
full join table_b b on a.id = b.id
full join table_c c on a.id = c.id 
where a.record_updated_at > '2019-01-01T02:00:00'
    or b.record_updated_at  > '2019-01-01T02:00:00'
    or c.record_updated_at > '2019-01-01T02:00:00'

Alternatively you can do what Mike has mentioned and used tasks + streams. This is a pretty neat way of doing it too.

Upvotes: 0

Mike Walton
Mike Walton

Reputation: 7369

I'd take a look at Tasks and Streams in Snowflake. These would allow you to setup a stream on each of your tables to capture changes that have occurred there, and then run a task against those streams on a scheduled basis IF there are changes available.

Upvotes: 3

Related Questions