Epidemiology15
Epidemiology15

Reputation: 1

Attempting to Automatically Create Label of Years Used in an Analysis

Say I had 5 years of data that were being used to calculate some measure across those aggregated years. Sometimes those are 5 consecutive years and other times data was not available for a given year so it must be skipped. For example 2016-2020 vs 2015-2017 & 2019-2020. In this case data was not available for 2018. I have been given a set of rules for how these years should be presented.

Consecutive years should be ex: 2016-2020

Non-Consecutive Years Will Look slightly different depending on where the missing year(s) occur.

While it would be trivial just to produce a comma separated list of all years used this is how they want the years presented. These labels are for a series of different measures so I am attempting to create these labels automatically within a macro. The number of years of data is also not always 5. It could be 3 years or even 10 years.

The obvious first idea was a do until process that started at the minimum year and progressively compared against the next year used in the analysis looking to see if the years were consecutive. Given the number of years isn't consistently 5 this was what made the most sense so far but I have not worked with do until loops very much. As such I couldn't quite figure out how to progressivley build the label over the iterations of the do until loop while also adhering to these rules.

For this example lets use the years 2015,2016,2017,2019,2020.

Any help would be greatly appreciated.

Upvotes: 0

Views: 62

Answers (4)

data _null_
data _null_

Reputation: 9109

This version creates the combined list. I think it has the features you describe.

data test;
    infile cards dsd;
    input x @@;
    d = dif(x); /*used to create RUN when dif > 1 increment run*/
    if d eq . or d > 1 then run+1;
    cards;
2015,2016,2017,2019,2020,2022,2024,2025,2026
;;;;
    run;
proc print;
    run;

data list(keep=combinedlist);
    length list $128 combinedList $256;
    do while(not eof);
        list=' ';
        do runlength=1 by 1 until(last.run); /*loop until last.run*/
            set test end=eof;
            by run;
            if first.run then list = cats(x); /*start of list*/
            end;    
         select(runlength); /*based on run-length create LIST */
            when(1);
            when(2)   list = catx(' & ',list,x);
            otherwise list = catx('-',list,x);
            end;
        combinedList = catx(', ',combinedList,list);
        end;
    output;
    stop;
    run;
 proc print;
    run;

enter image description here

Upvotes: 0

data _null_
data _null_

Reputation: 9109

This should get you started.

data test;
    infile cards dsd;
    input x @@;
    d = dif(x); /*used to create RUN when dif > 1 increment run*/
    if d eq . or d > 1 then run+1;
    cards;
2015,2016,2017,2019,2020,2022,2024,2025,2026
;;;;
    run;
proc print;
    run;
proc summary data=test nway; /*count the number of years in each run*/
    class run;
    output out=runlen(drop=_type_);
    run;
data test; /* merge TEST and RUNLEN*/
    length list $128;
    do until(last.run); /*loop until last.run*/
        merge test runlen;
        by run;
        if first.run then list = cats(x); /*start of list*/
        end;    
     select(_freq_); /*based on run-length create LIST */
        when(1);
        when(2)   list = catx(' & ',list,x);
        otherwise list = catx('-',list,x);
        end;
    run;
 proc print;
    run;

Upvotes: 1

Richard
Richard

Reputation: 27508

This could be a case of a picture is worth a thousand words.

Example:

/* simulate raw results of a survey of 10 questions over 16 years */
data surveyresults;
  call streaminit(20230125);
  do qid = 1 to 10;
    do year = 2007 to 2022;
      if year = 2021 then continue;
      if rand('uniform') > 0.85 then continue;
      do _n_ = 1 to rand('integer', 30);
        pid + 1;
        if rand('uniform') > 0.85 then continue;
        answercode = rand('integer', 20);
        output;
      end;
    end;
  end;
run;

proc sql noprint;
  create table stage1 as
  select distinct qid, year, 1 as flag
  from surveyresults
  order by qid, year
  ;
  
  select catx(' ', min(year), 'to', max(year))
  into :year_range
  from stage1 ;

ods html file='plot.html';

proc sgplot data=stage1;
  scatter x=year y=qid / markerattrs=(symbol=squarefilled size=8.2%);
  xaxis values=(&year_range);
  yaxis type=discrete;
run;

ods html close;

enter image description here

Upvotes: 1

Reeza
Reeza

Reputation: 21274

Probably an easier way than this, but this works for your scenarios.

data years;
input year;
cards;
2015
2016
2017
2019
2020
;
run;

/* data years; */
/* input year; */
/* cards; */
/* 2010 */
/* 2012 */
/* 2014 */
/* 2016 */
/* 2017 */
/* ; */
/* run; */

/* data years; */
/* input year; */
/* cards; */
/* 2015 */
/* 2016 */
/* 2017 */
/* 2018 */
/* 2020 */
/* ; */
/* run; */



data want;
merge years end=eof years(firstobs=2 rename=year=next_year);
length year_list $200. interval $20.;;
retain year_list start_year;

_dif= next_year - year;
if _n_=1 then start_year=year; 

if _dif > 1 or eof  then do;


if start_year ne year then interval = catx('-', start_year, year);
else interval = put(start_year, 8. -l);



 if eof then year_list=catx(" & ", year_list, interval); 
 else year_list = catx(", ", year_list, interval); 

start_year = next_year;
end;

if eof then call symputx('year_list', year_list);
run;

%put &year_list;

Upvotes: 0

Related Questions