DJJ
DJJ

Reputation: 2549

update a table efficiently using proc sql

I need to update a table with past missing information using past versions of the same table. The update is needed as some earlier information are not available anymore in the most recent table. Let tableA be the table at time0 and tableB the table at time1 and so on. What i'm really interested in is the last updated table.

So far i have tried this method;

create view _tableB
select * 
from tableA 
union 
select * 
from tableB a 
where a.id not in (select id from tableA);

then I proceeded with:

create view _tableC
select *
from _tableB
union 
select *
from tableC a
where a.id not in (select id from _tableB); 

and so on till i reach my final table where i create a table.

create table _tableT 
select * 
from _tableS 
union 
select * 
from tableT a
where a.id not in (select id from _tableS); 

Do you see something better up here?

P.S: I have to mention that for each observations i can have many languages. The information was in this way

id|lguage1|lguage2|lguage3|

and i put it wide2long using a view and this method.

id1|lguage1 
id1|lguage2 
id1|lguage3
id2|lguage1

The informations are not sorted by id and language.
Thanks.

Upvotes: 1

Views: 250

Answers (1)

mvherweg
mvherweg

Reputation: 1283

Given a couple of assumptions, you can simplify this greatly:
1. Each individual dataset has no duplicate ID's.
2. Each dataset is sorted by ID.

Then you simply do this: data RESULT; set _tableS ... _tableC _tableB _table; by id; if first.id; run;

This way, the code is more compact and all data is only read once. It will read in all the tables and output an interweaved result (i.e.: the result will also by sorted on ID)

The firs.id filter means that he will only keep the first record of each id value he encounters. Since the datasets are mentioned with the most recent first, he will take the most recent record available for each ID and ditch the others.

Upvotes: 1

Related Questions