richard
richard

Reputation: 1

Creating groups of specific data per subject using SAS

My objective is to determine for each subject how many data observations are getting, at least, 2 consecutives "Y" as eligible value. For most of subjects, case only occur once but I realized looking to data that for some subjects it can happen 2, 3 times. So I need to create an extra variable (called GROUP) to keep track of these multiple occurrences within subjects. By using SAS language, could someone help me to get GROUP variable properly created ? Detailed below is an dataset example of subjects (ID) with different study days (CVDY) and eligibility criteria (Y/N format) for a specific lab parameter (not included in the example).

Thanks for your support.

data WHAT_I_HAVE;
length ID CVDY $3. ELIG $2.;
infile datalines TRUNCOVER;
input ID $ CVDY $ ELIG $ ;
datalines;
101 1 N
101 2 Y
101 3 Y
101 4 N
201 1 Y
201 2 Y
201 3 N
201 4 Y
201 5 Y 
201 6 Y
201 7 N
201 8 Y
201 9 Y
301 1 Y
301 2 Y
301 3 N
301 4 N
301 5 Y
;
run;


data WHAT_I_WANT;
length ID CVDY $3. ELIG GROUP $2.;
infile datalines TRUNCOVER;
input ID $ CVDY $ ELIG $ GROUP $;
datalines;
101 1 N .
101 2 Y 1
101 3 Y 1
101 4 N .
201 1 Y 1
201 2 Y 1
201 3 N .
201 4 Y 2
201 5 Y 2
201 6 Y 2
201 7 N .
201 8 Y 3
201 9 Y 3
301 1 Y 3
301 2 Y 3
301 3 N .
301 4 N .
301 5 Y .
301 6 N .
;
run;

Upvotes: 0

Views: 219

Answers (1)

Tom
Tom

Reputation: 51566

You can use a double DOW loop. The first loop you can use to count how many rows contribute to this run of contiguous values of ELIG (within this value of ID). You need to use the NOTSORTED keyword on the BY statement to have the data step keep track of when the value of ELIG changes.

Now you have the information you need to know whether or not to increment your counter of the number of runs of two or more Y values in a row. To get your exact output you will need to use two variables. One that keeps the running count and the other to be the value you want to write.

The second DO loop just allows you to re-read the detail lines and write them back out so that the same value of GROUP is attached to each row in the run.

data want;
  do rows=1 by 1 until(last.elig);
    set have;
    by id elig notsorted;
    if first.id then cnt=0;
  end;
  if elig='Y' and rows>1 then do;
    cnt+1;
    group=cnt;
  end;
  do rows=1 to rows;
    set have;
    output;
  end;
  drop rows cnt;
run;

Results

Obs    ID     CVDY    ELIG    group

  1    101     1       N        .
  2    101     2       Y        1
  3    101     3       Y        1
  4    101     4       N        .
  5    201     1       Y        1
  6    201     2       Y        1
  7    201     3       N        .
  8    201     4       Y        2
  9    201     5       Y        2
 10    201     6       Y        2
 11    201     7       N        .
 12    201     8       Y        3
 13    201     9       Y        3
 14    301     1       Y        1
 15    301     2       Y        1
 16    301     3       N        .
 17    301     4       N        .
 18    301     5       Y        .

Note there appears to be a typo in your expected results as the last ID only has one run of length 2.

Upvotes: 1

Related Questions