diaferiaj
diaferiaj

Reputation: 375

Counting observations using multiple BY groups SAS

I am examining prescription patterns within a large EHR dataset. The data is structured so that we are given several key bits of information, such as patient_num, encounter_num, ordering_date, medication, age_event (age at event) etc. Example below:

    Patient_num  enc_num  ordering_date  medication  age_event
        1111     888888     07NOV2008    Wellbutrin     48
        1111     876578     11MAY2011    Bupropion      50
        2222     999999     08DEC2009    Amitriptyline  32
        2222     999999     08DEC2009    Escitalopram   32
        3333     656463     12APR2007    Imipramine     44
        3333     643211     21DEC2008    Zoloft         45
        3333     543213     02FEB2009    Fluoxetine     45

Currently I have the dataset sorted by patient_id then by ordering_date so that I can see what each individual was prescribed during their encounters in a longitudinal fashion. For now, I am most concerned with the prescription(s) that were made during their first visit. I wrote some code to count the number of prescriptions and had originally restricted later analyses to RX = 1, but as we can see, that doesn't work for people with multiple scripts on the same encounter (Patient 2222).

    data pt_meds_;
     set pt_meds;
     by patient_num;
     if first.patient_num then  RX = 1;
     else RX + 1;
    run;

     Patient_num  enc_num  ordering_date  medication  age_event  RX
        1111     888888     07NOV2008    Wellbutrin     48        1
        1111     876578     11MAY2011    Bupropion      50        2
        2222     999999     08DEC2009    Amitriptyline  32        1
        2222     999999     08DEC2009    Escitalopram   32        2
        3333     656463     12APR2007    Imipramine     44        1
        3333     643211     21DEC2008    Zoloft         45        2
        3333     543213     02FEB2009    Fluoxetine     45        3

I think it would be more appropriate to recode the encounter numbers into a new variable so that they reflect a style similar to the RX variable. Where each encounter is listed 1-n, and the number will repeat if multiple scripts are made in the same encounter. Such as below:

     Patient_num  enc_num  ordering_date  medication  age_event  RX   Enc_
        1111     888888     07NOV2008    Wellbutrin     48        1    1
        1111     876578     11MAY2011    Bupropion      50        2    2
        2222     999999     08DEC2009    Amitriptyline  32        1    1 
        2222     999999     08DEC2009    Escitalopram   32        2    1
        3333     656463     12APR2007    Imipramine     44        1    1
        3333     643211     21DEC2008    Zoloft         45        2    2
        3333     543213     02FEB2009    Fluoxetine     45        3    3

From what I have seen, this could be possible with a variant of the above code using 2 BY groups (patient_num & enc_num), but I can't seem to get it. I think the first. / last. codes require sorting, but if I am to sort by enc_num, they won't be in chronological order because the encounter numbers are generated by the system and depend on all other encounters going in at that time.

I tried to do the following code (using ordering_date instead because its already sorted properly) but everything under Enc_ is printed as a 1. I'm sure my logic is all wrong. Any thoughts?

    data pt_meds_test;
     set pt_meds_;
     by patient_num ordering_date;
     if first.patient_num; 
     if first.ordering_date then enc_ = 1;
     else enc_ + 1;
    run; 

Upvotes: 0

Views: 5297

Answers (2)

fl0r3k
fl0r3k

Reputation: 629

First

.First/.Last flags doesn't require sorting if data is properly ordered or you use NOTSORTED in your BY statement. If your variable in BY statement is not properly ordered then BY statment will throw error and stop executing when encounter deviations. Like this:

data class;
   set sashelp.class;
   by age;
   first = first.age;
   last = last.age;
run;

 ERROR: BY variables are not properly sorted on data set SASHELP.CLASS.
 Name=Alfred Sex=M Age=14 Height=69 Weight=112.5 FIRST.Age=1 LAST.Age=1 first=. last=. _ERROR_=1 _N_=1
 NOTE: The SAS System stopped processing this step because of errors.
 NOTE: There were 2 observations read from the data set SASHELP.CLASS.        

Try this code to see how exacly .first/.last flags works:

data pt_meds_test;
   set pt_meds_;
   by patient_num ordering_date;

   fp = first.patient_num;
   lp = last.patient_num;
   fo = first.ordering_date;
   lo = last.ordering_date;
run;

Second

Those condidions works differently than you think:

if expression;
  • If expression is true then continue with next instructions after if.
  • Otherwise return to begining of data step (no implicit output). This also implies your observation is not retained in the output.

In most cases if without then is equivalent to where. However

  • whereworks faster but it is limited to variables that comes from data set you are reading
  • if can be used with any type of expression including calculated fields

More info:: IF Statement, Subsetting

Third

I think lag() function can be your answear.

data pt_meds_test;
   set pt_meds_;
   by patient_num;

   retain enc_;

   prev_patient_num = lag(patient_num);
   prev_ordering_date = lag(ordering_date);

   if first.patient_num then enc_ = 1;
   else if patient_num = prev_patient_num and ordering_date ne prev_ordering_date then enc_ + 1;
   end;
run;

With lag() function you can look what was the value of vairalbe on the previos observation and compare it with current one later. But be carefull. lag() doesn't look for variable value from previous observation. It takes vale of variable and stores it in a FIFO queue with size of 1. On next call it retrives stored value from queue and put new value there. More info: LAG Function

Upvotes: 4

Jonathan Wilson
Jonathan Wilson

Reputation: 676

I'm not sure if this hurts the rest of your analysis, but what about just

proc freq data=pt_meds noprint; 
    tables patient_num ordering_date / out=pt_meds_freq;
run;

data pt_meds_freq2;
    set pt_meds_freq;
    by patient_num ordering_date;
    if first.patient_num;
run;

Upvotes: 0

Related Questions