Reputation: 75
I have an example data set
data test;
input ID 1-4
var1 $ 5-13
;
datalines;
1 Apples
2 Chocolate
3 Milk
3 Cocoa
3 Cake
3 Sugar
4 Marmelade
5 Banana
6 Rice
6 Beef
;
run;
I want to create a third variable all_names where I group all entries for var1 given that the ID appears multiple times. So for ID = 3, all_names should read "milk, cocoa, cake, sugar" and for ID = 6 it sould say "Rice, Beef".
I have some code that works but only in cases where ID appears twice:
data step1;
set test;
by ID;
prevID=lag(ID);
prevVar1=lag(var1);
if first.ID then prevID = . ;
if ID eq prevID then all_names = cat(var1, ", ", prevVar1);
else all_names = var1;
drop prevID prevVar1;
run;
How can I make my code work for cases where ID appears several times? I tried playing around with first.ID and last.ID but haven't figured it out yet. Thanks!
Upvotes: 0
Views: 69
Reputation: 532
Another way without a loop (and without lag function, which is a bit unintuitive to use when using by statement):
data have;
input ID 1-4
var1 $ 5-13
;
datalines;
1 Apples
2 Chocolate
3 Milk
3 Cocoa
3 Cake
3 Sugar
4 Marmelade
5 Banana
6 Rice
6 Beef
;
run;
If you want 10 obs:
data step1;
set have;
by ID;
length all_names $200;
retain all_names;
if first.ID then all_names = var1;
else all_names = cats(all_names,', ',var1);
run;
If you want only one obs for each unique ID just add this:
data step2;
set step1 (drop=var1);
by ID;
if last.ID then output;
run;
Upvotes: 1
Reputation: 629
Accepted answer is so overcomplicated. I will just leave this simpler solution here.
/* If needed just to make sure it's sorted by ID for datastep group by processing */
proc sort data=test;
by ID;
run;
data temp;
set test;
by ID;
length all_names $200;
retain all_names;
all_names = catx(', ', all_names, var1);
if last.ID then output;
drop var1;
run;
proc sql;
create table results as
select t1.*, t2.all_names
from test as t1, temp as t2
where t1.ID = t2.ID
;
quit;
Upvotes: 0
Reputation: 4937
Here are solutions for if you want 10 obs in the resulting data (like in the input data) and one obs for each unique ID.
data test;
input ID 1-4
var1 $ 5-13
;
datalines;
1 Apples
2 Chocolate
3 Milk
3 Cocoa
3 Cake
3 Sugar
4 Marmelade
5 Banana
6 Rice
6 Beef
;
run;
data want(drop = var1);
do until (last.ID);
set test;
by ID;
length all_names $200;
all_names = catx(', ', all_names, var1);
end;
run;
data want2;
do until (last.ID);
set test;
by ID;
length all_names $200;
all_names = catx(', ', all_names, var1);
end;
do until (last.ID);
set test;
by ID;
output;
end;
run;
Upvotes: 3