How to transpose dataset more simply

Question

I'd like to make the dataset like the below. I got it, but it’s a long program. I think it would become more simple. If you have a good idea, please give me some advice.

This is the data.

data test;
input ID $ NO DAT1 $ TIM1 $ DAT2 $ TIM2 $;
cards;
1   1   2020/8/4    8:30    2020/8/5    8:30
1   2   2020/8/18   8:30    2020/8/19   8:30
1   3   2020/9/1    8:30    2020/9/2    8:30
1   4   2020/9/15   8:30    2020/9/16   8:30
2   1   2020/8/4    8:34    2020/8/5    8:34
2   2   2020/8/18   8:34    2020/8/19   8:34
2   3   2020/9/1    8:34    2020/9/2    8:34
2   4   2020/9/15   8:34    2020/9/16   8:34
3   1   2020/8/4    8:46    2020/8/5    8:46
3   2   2020/8/18   8:46    2020/8/19   8:46
3   3   2020/9/1    8:46    2020/9/2    8:46
3   4   2020/9/15   8:46    2020/9/16   8:46
;
run;

This is my program.

 data
 t1(keep = ID A1 A2 A3 A4)
 t2(keep = ID B1 B2 B3 B4)
 t3(keep = ID C1 C2 C3 C4)
 t4(keep = ID D1 D2 D3 D4);
 set test;
 if NO = 1 then do;
 A1 = DAT1;
 A2 = TIM1;
 A3 = DAT2;
 A4 = TIM2;
 end;
 *--- cut (NO = 2, 3, 4 are same as NO = 1)--- ;
 end;
 if NO = 1 then output t1;
 if NO = 2 then output t2;
 if NO = 3 then output t3;
 if NO = 4 then output t4;
run;

proc sort data = t1;by ID; run;
proc sort data = t2;by ID; run;
proc sort data = t3;by ID; run;
proc sort data = t4;by ID; run;
data test2;
 merge t1 t2 t3 t4;
 by ID;
run;

Richard · Accepted Answer

Tall to very wide data transformations are typically

sketchy, you put data into metadata (column names or labels) or lose referential context, or
a reporting layout for human consumption

Presuming your "as dataset like below" is accurate and you want to pivot your data in such a manner.

Way 1 - self merging subsets with renaming

You should see that the NO field is a sequence number that can be used as a BY variable when merging data sets.

Consider this example code as a template that could be the source code generation of a macro:

NO is changed name to seq for better clarity

data want;
  merge
    have (where=(seq=1) rename=(dat1=A1 tim1=B1 dat2=C1 tim2=D1)
    have (where=(seq=2) rename=(dat1=A2 tim1=B2 dat2=C2 tim2=D2)
    have (where=(seq=3) rename=(dat1=A3 tim1=B3 dat2=C3 tim2=D3)
    have (where=(seq=4) rename=(dat1=A4 tim1=B4 dat2=C4 tim2=D4)
  ;
  by id;
run;

For unknown data sets organized like the above pattern, the code generation requirements should be obvious; determine maximum seq and have the names of variables to pivot be specified (as macro parameters, in which loop over the names occurs).

Way 2 - multiple transposes

Caution, all pivoted columns will be character type and contain the formatted result of original values.

proc transpose data=have(rename=(dat1=A tim1=B dat2=C tim2=D)) out=stage1;
  by id seq;
  var a b c d;
run;
  
proc transpose data=stage1 out=want;
  by id;
  var col1;
  id _name_ seq;
run;

Way 3 - Use array and DOW loop

* presume SEQ is indeed a unit monotonic sequence value;
data want (keep=id a1--d4);
  do until (last.id);
    array wide A1-A4 B1-B4 C1-C4 D1-D4;
    wide [ (seq-1)*4 + 1 ] = dat1;
    wide [ (seq-1)*4 + 2 ] = tim1;
    wide [ (seq-1)*4 + 3 ] = dat2;
    wide [ (seq-1)*4 + 4 ] = tim2;
  end;
  
  keep id A1--D4;

*  format A1 A3 B1 B3 C1 C3 D1 D3 your-date-format;
*  format A2 A4 ................. your-time-format;

Way 4 - change your data values to datetime

I'll leave this to esteemed others

How to transpose dataset more simply

Answers (2)

Related Questions