B--rian
B--rian

Reputation: 5870

Translating a SQL update to SAS DI

Assume we have a table P_DEF in which we want to update the value of column RUN_ID for a certain subset which we stored in another table TMP. Here how I would do it in SQL:

update P_DEF
set RUN_ID = (-1) * TMP.RUN_ID /* change the sign of the value */
from P_DEF
inner join TMP
on P_DEF.RUN_ID = TMP.RUN_ID
and P_DEF.ITEM_ID = TMP.ITEM_ID
and P_DEF.ITEM_TITLE = TMP.ITEM_TITLE

Now the big question: To my knowledge, a proc SQL does not support this kind of filtered update. So how do I do this with a minimal number of transformations in SAS DI(S)?

Upvotes: 0

Views: 354

Answers (1)

vasja
vasja

Reputation: 4792

Update by join is not supported by SAS SQL, but you can do CORRELATED UPDATE: update by values from correlated subquery:

data P_DEF;
infile cards;
length RUN_ID_ORIG 8;
input RUN_ID ITEM_ID ITEM_TITLE $20.;
RUN_ID_ORIG = RUN_ID;
cards;
1 1 some title
1 1 should be negative
1 2 another title
1 3 should be negative
4 44 another title
5 44 should be negative
;
run;

data TMP;
infile cards;
input RUN_ID ITEM_ID ITEM_TITLE $20. @30 NEW_ID;
cards;
1 1 should be negative       100
1 3 should be negative       123
5 44 should be negative      188
;
run;

proc sql;
/* this unwillingly updates all records, nonmatched will be set to null */
update P_DEF
set RUN_ID = (select NEW_ID from TMP
            where P_DEF.RUN_ID = TMP.RUN_ID
            and P_DEF.ITEM_ID = TMP.ITEM_ID
            and P_DEF.ITEM_TITLE = TMP.ITEM_TITLE )
;
select * from P_DEF
;
quit;

The correlated update is not sufficient when there are non-matches, so you need to add filter to only update matched rows. When joining on multiple columns, I usually rely on catx to get unique values (depending on you data, you might need to use different numeric formats in put functions):

proc sql;
update P_DEF set RUN_ID = RUN_ID_ORIG; /* reset RUN_ID */
quit;


/* correct "inner join" update */
proc sql;
update P_DEF
set RUN_ID = (select NEW_ID from TMP
            where P_DEF.RUN_ID = TMP.RUN_ID
            and P_DEF.ITEM_ID = TMP.ITEM_ID
            and P_DEF.ITEM_TITLE = TMP.ITEM_TITLE )
where
          catx('#', put(RUN_ID, 16.), put(ITEM_ID, 16.), ITEM_TITLE)
in select catx('#', put(RUN_ID, 16.), put(ITEM_ID, 16.), ITEM_TITLE)
from TMP
;
select * from P_DEF;
quit;

The version above is slightly different from your exact example to show how to get value from subquery - the NEW_ID column.

Simplified version where you only use lookup table to identify rows to be updated is this:

proc sql;
update P_DEF set RUN_ID = RUN_ID_ORIG; /* reset RUN_ID */
quit;

proc sql;
/* simplified for your case:
you dont actually use value from TMP that does not exist in P_DEF */
update P_DEF
set RUN_ID = -1 * RUN_ID
where
   RUN_ID > 0 /* so we can rerun this if needed */
   and      catx('#', put(RUN_ID, 16.), put(ITEM_ID, 16.), ITEM_TITLE)
in ( select catx('#', put(RUN_ID, 16.), put(ITEM_ID, 16.), ITEM_TITLE)
from TMP )
;
select * from P_DEF;
quit;

As you can see, the correlated update might need two subqueries to update single column, so don't expect it to be perfomant on bigger tables. You might be better with data step methods: MERGE, MODIFY or UPDATE statements.

As for SAS Data Integration Studio tranformation you asked for, I believe you can achieve this with SCD Type 1 Loader, this will generate some of the code I mentioned.

Upvotes: 1

Related Questions