Reputation: 75

Get all entries for a BY variable

I have an example data set

data test;
input ID 1-4
var1 $ 5-13
;
datalines;
1   Apples
2   Chocolate
3   Milk
3   Cocoa
3   Cake
3   Sugar
4   Marmelade
5   Banana
6   Rice
6   Beef
;
run;

I want to create a third variable all_names where I group all entries for var1 given that the ID appears multiple times. So for ID = 3, all_names should read "milk, cocoa, cake, sugar" and for ID = 6 it sould say "Rice, Beef".

I have some code that works but only in cases where ID appears twice:

 data step1;

  set test;

  by ID;


  prevID=lag(ID);
  prevVar1=lag(var1);

  if first.ID then prevID = . ;

  if ID eq prevID then all_names = cat(var1, ", ", prevVar1);

  else all_names = var1;
  
  drop prevID prevVar1;
  run;

How can I make my code work for cases where ID appears several times? I tried playing around with first.ID and last.ID but haven't figured it out yet. Thanks!

Upvotes: 0

Answers (3)

Negdo

Reputation: 532

Another way without a loop (and without lag function, which is a bit unintuitive to use when using by statement):

data have;
input ID 1-4
var1 $ 5-13
;
datalines;
1   Apples
2   Chocolate
3   Milk
3   Cocoa
3   Cake
3   Sugar
4   Marmelade
5   Banana
6   Rice
6   Beef
;
run;

If you want 10 obs:

data step1;
    set have;
    by ID;
    length all_names $200;
    retain all_names;
    if first.ID then all_names = var1;
    else all_names = cats(all_names,', ',var1);
run;

If you want only one obs for each unique ID just add this:

data step2;
    set step1 (drop=var1);
    by ID;
    if last.ID then output;
run;

Upvotes: 1

fl0r3k

Reputation: 629

Accepted answer is so overcomplicated. I will just leave this simpler solution here.

/* If needed just to make sure it's sorted by ID for datastep group by processing */
proc sort data=test;
   by ID;
run;

data temp;
   set test;
   by ID;

   length all_names $200;
   retain all_names;
   
   all_names = catx(', ', all_names, var1);
   
   if last.ID then output;
   
   drop var1;
run;

proc sql;
   create table results as
      select t1.*, t2.all_names
      from test as t1, temp as t2
      where t1.ID = t2.ID
   ;
quit;

Upvotes: 0

PeterClemmensen

Reputation: 4937

Here are solutions for if you want 10 obs in the resulting data (like in the input data) and one obs for each unique ID.

data test;
input ID 1-4
var1 $ 5-13
;
datalines;
1   Apples
2   Chocolate
3   Milk
3   Cocoa
3   Cake
3   Sugar
4   Marmelade
5   Banana
6   Rice
6   Beef
;
run; 

data want(drop = var1);

   do until (last.ID);
      set test;
      by ID;
      length all_names $200;
      all_names = catx(', ', all_names, var1);
   end;

run;

data want2;

   do until (last.ID);
      set test;
      by ID;
      length all_names $200;
      all_names = catx(', ', all_names, var1);
   end;

   do until (last.ID);
      set test;
      by ID;
      output;
   end;

run;

Upvotes: 3

Get all entries for a BY variable

Answers (3)

Related Questions