stat_lap
stat_lap

Reputation: 1

two diagnosis within one year

Hi I have the following dataset. I am trying to find id who have at least 2 dx(411 or 430) within one year (dx1-dx5) and keep them in the dataset, otherwise drop. For example output in this case would be keeping id 1 and 3

data stat480.temp2;
input id $ date $9. dx1 $ dx2 $ dx3 $ dx4 $ dx5 $;
datalines;
    1 1/1/2005 411 412 413 414 415 
    1 2/2/2005 411 412 413 414 415
    2 1/1/2006 411 412 413 418 419
    2 3/3/2006 420 421 422 423 424
    3 3/4/2007 430 440 450 460 470
    3 12/4/2006 430 510 520 530 560
;
run;

Upvotes: 0

Views: 105

Answers (2)

Dominic Comtois
Dominic Comtois

Reputation: 10411

Still a little work remains to be done but this should work as a good starting point...

* First preping of data - eliminate rows that do not have dx411 nor dx430 ;
* and calculate time difference between rows of the same id) ;

data have;
  informat id 1. date mmddyy10. dx1-dx5 3.;
  format date yymmdd10. date_prev yymmdd10. delta_months 6.2;
  retain date_prev (.) id_prev (.);
  input id date dx1-dx5;
  array dx(5);

  if id = id_prev then 
    delta_months = intck("months", date_prev, date);

  dx_411 = 411 in dx;
  dx_430 = 430 in dx;

  if dx_411=1 or dx_430=1 then output;

  date_prev = date;
  id_prev = id;
  datalines;
1 1/1/2005 411 412 413 414 415 
1 2/2/2005 411 412 413 414 415
2 1/1/2006 411 412 413 418 419
2 3/3/2006 420 421 422 423 424
3 12/4/2006 430 510 520 530 560
3 3/4/2007 430 440 450 460 470
;

Note that I have pre-sorted data so that older lines come first within an id group.

Identify and keep only id's showing on more than 1 row

proc sql;
  create table want as
    select id, date, dx1, dx2, dx3, dx4, dx5,
           delta_months, dx_411, dx_430
      from have
     where id in (select id 
                    from have
                    group by id
                    having count(id) > 1);
quit;

proc print;run;

Results
Resulting table

What remains to be done:

  • filter out rows/ids for which delta_months is above your threshold.

  • filter out rows/ids that have, instead of 411-411 or 430-430, combinations 411-430 or 430-411. This could be done wither via proc sql or with a data step (incorporate it in the first one or in a subsequent independent one) using a retain statement. In this case-example, there were none of those.

Upvotes: 1

Sean
Sean

Reputation: 1120

I added a few extra rows to demonstrate that this works for more than two observations per year. Let me know if you have any questions.

data temp;
input id date $10. dx1 dx2 dx3 dx4 dx5;
datalines;
    1 01/01/2005 411 412 413 414 415 
    1 02/02/2005 411 412 413 414 415
    2 01/01/2006 411 412 413 418 419
    2 03/03/2006 420 421 422 423 424
    3 12/04/2006 430 510 520 530 560
    3 03/04/2007 430 440 450 460 470
    3 12/04/2007 430 510 520 530 560
    3 12/05/2007 430 510 520 530 560
    3 03/04/2008 411 440 450 460 470
;
run;

First, convert each row's date variable into a numeric format, and concatenate its dx variables (to more easily compare dx1 with dx4, for example):

data temp2;
    set temp;
    date_num = day(input(trim(date), mmddyy10.)) + 
        month(input(trim(date), mmddyy10.)) * 12 + year(input(trim(date), mmddyy10.)) * 365;
    all_dx = catx(', ', dx1, dx2, dx3, dx4, dx5);
run;

Second, join this dataset to itself such that every observation is compared to every other observation within a year of it:

proc sql;
    create table temp3 as select distinct
        a.*, b.date_num as date_num2, b.all_dx as all_dx2
        from temp2 as a
        left join temp2 as b
        on a.id = b.id and (b.date_num - 365) < a.date_num <= (b.date_num - 1);
quit;

Subset based on the 411 and 430 criteria:

data temp4;
    set temp3 (where = (
        (index(all_dx, "411") or index(all_dx, "430")) and
        (index(all_dx2, "411") or index(all_dx2, "430"))));
run;

Select distinct IDs remaining:

proc sql;
    create table temp5 as select distinct id from temp4;
quit;

Yielding the dataset:

1
3

Upvotes: 1

Related Questions