pinegulf
pinegulf

Reputation: 1396

Outputting conditionally from merge

I want to update a history file in SAS. I have new observations, which may overlap with existing data lines.

What is needed, is a file, which would have lines from dataset (new_data) where they exist and in case the lines do not exist, then from old set (old_data). What I've come up is a clunky merge operation, which is conditional on the order of the datasets. (==Works only if New_data is after Old_data. :?)

data new_data;
    input key value;
datalines;
    1 10
    1 11
    2 20
    2 21 
    ;
run;

data old_data;
    input key value;
    datalines;
    2 50
    2 51 
    3 30 
    3 31 
    ;
run;

So I'd like to have the following:

key value
1 10
1 11
2 20
2 21
3 30
3 31

However the following does not work. It produces the output below it.

data updated_history;
    merge New_data(in=a) old_data(in=b) ;
    by key;
    if a or (b and not a );
run;

....
2 50 
2 51 
...

But for some reason this does:

data updated_history;
    merge old_data(in=b) New_data(in=a);
    by key;
    if a or (b and not a );
run;

Question: Is there an intelligent way to manage from which dataset the values are select from. Something like: if a then value_from_dataset a;

Upvotes: 0

Views: 58

Answers (2)

Tom
Tom

Reputation: 51591

Since you have multiple observations per key value you probably do NOT want to use MERGE to combine these files. You could do it using SET by reading the data twice using two DOW loops. In that case it won't matter the order of the dataset in the SET statement since the records are interleaved instead of being joined. This first loop will calculate which of the two input datasets have any observations for this KEY value.

data want ;
  anyold=0;
  anynew=0;
  do until (last.key);
    set old_data (in=inold) new_data(in=innew);
    by key ;
    if inold then anyold=1;
    if innew then anynew=1;
  end;
  do until (last.key);
    set old_data (in=inold) new_data(in=innew);
    by key ;
    if not (anyold and anynew and inold) then output;
  end;
  drop anyold anynew;
run;

This type of combination is probably easier to code using SQL.

proc sql ;
  create table want as
    select key,value from new_data 
    union
    select key,value from old_data
    where key in (select key from old_data except select key from new_data)
    order by 1
  ;
quit;

Upvotes: 0

DomPazz
DomPazz

Reputation: 12465

The order in which you list the data sets in the MERGE is the order the data is taken. So when the order is old, new values from old are read and then values from new overwrite the values from old. This is why your second version works and the first does not.

Upvotes: 2

Related Questions