Reputation: 13
I have the following dataset.
ID var1 var2 var3
1 100 200
1 150 300
2 120
2 100 150 200
3 200 150
3 250 300
I would like to have a new dataset with only the last not blank record for each group of variables.
id var1 var2 var3
1 150 200 300
2 100 150 200
3 250 300 150
last. select the last reord, but i need to selet the last not null record
Upvotes: 0
Views: 191
Reputation: 51566
Looks like you want the last non missing value for each non-key variable. So you can let the UPDATE
statement do the work for you. Normally for update operation you apply transactions to a master dataset. But for your application you can use OBS=0
dataset option to make your current dataset work as both the master and the transactions.
data want ;
update have(obs=0) have ;
by id;
run;
Upvotes: 3
Reputation: 27498
Riccardo:
There are many ways to select, within each group, the last non-missing value of each column. Here are three ways. I would not say one is the best way, each approach has it's merits depending on the specific data set, coder comfort and long term maintainability.
Perhaps the simplest of the coding approaches goes like this:
UPDATE
statement to replace each like named variable with a non-missing value.Example:
data want_base_table(label="One row per id, original columns");
set have;
by id;
if first.id;
run;
* use have as a transaction data set in the update statement;
data want_by_update;
update want_base_table have;
by id;
run;
Others will involve arrays and first. and last. flag variables of the BY
group. This example shows a DOW
loop that tracks the non-missing value and then uses them for the output of each ID
:
data want_dow;
do until (last.id);
set have;
by id;
array myvar var1-var3 ;
array myhas has1-has3 ;
do _i = 1 to dim(myvar);
if not missing (myvar(_i)) then
myhas(_i) = myvar(_i);
end;
end;
do _i = 1 to dim(myhas);
myvar(_i) = myhas(_i);
end;
output;
drop _i has1-has3;
run;
A loop is most often called a DOW
loop when there is a SET
statement inside the DO; END;
block and the loop termination is triggered by the last.
flag variable. A similar non DOW approach (not shown) would use the implicit loop and first.
to initialize the tracking array and last.
for copying the values (tracked within group) into the columns for output.
data want_by_column_slices;
merge
have (keep=id var1 where=(var1 ne .))
have (keep=id var2 where=(var2 ne .))
have (keep=id var3 where=(var3 ne .))
;
by id;
if last.id;
run;
Upvotes: 0