Reputation: 1396
I want to update a history file in SAS. I have new observations, which may overlap with existing data lines.
What is needed, is a file, which would have lines from dataset (new_data) where they exist and in case the lines do not exist, then from old set (old_data). What I've come up is a clunky merge operation, which is conditional on the order of the datasets. (==Works only if New_data is after Old_data. :?)
data new_data;
input key value;
datalines;
1 10
1 11
2 20
2 21
;
run;
data old_data;
input key value;
datalines;
2 50
2 51
3 30
3 31
;
run;
So I'd like to have the following:
key value
1 10
1 11
2 20
2 21
3 30
3 31
However the following does not work. It produces the output below it.
data updated_history;
merge New_data(in=a) old_data(in=b) ;
by key;
if a or (b and not a );
run;
....
2 50
2 51
...
But for some reason this does:
data updated_history;
merge old_data(in=b) New_data(in=a);
by key;
if a or (b and not a );
run;
Question: Is there an intelligent way to manage from which dataset the values are select from. Something like: if a then value_from_dataset a;
Upvotes: 0
Views: 58
Reputation: 51591
Since you have multiple observations per key value you probably do NOT want to use MERGE to combine these files. You could do it using SET by reading the data twice using two DOW loops. In that case it won't matter the order of the dataset in the SET statement since the records are interleaved instead of being joined. This first loop will calculate which of the two input datasets have any observations for this KEY value.
data want ;
anyold=0;
anynew=0;
do until (last.key);
set old_data (in=inold) new_data(in=innew);
by key ;
if inold then anyold=1;
if innew then anynew=1;
end;
do until (last.key);
set old_data (in=inold) new_data(in=innew);
by key ;
if not (anyold and anynew and inold) then output;
end;
drop anyold anynew;
run;
This type of combination is probably easier to code using SQL.
proc sql ;
create table want as
select key,value from new_data
union
select key,value from old_data
where key in (select key from old_data except select key from new_data)
order by 1
;
quit;
Upvotes: 0
Reputation: 12465
The order in which you list the data sets in the MERGE
is the order the data is taken. So when the order is old
, new
values from old
are read and then values from new
overwrite the values from old
. This is why your second version works and the first does not.
Upvotes: 2