Reputation: 167
I'm not sure of the best way to describe this, and I'll admit that the code I wrote to recreate the problem in a smaller format isn't quite accurate.
I have 7 data sets that have the same number of columns (122) but a different number of rows. The labels for these columns are identical except for an underscore and an integer. Example: first column of each data set is "study_id_1" "study_id_2" ... "study_id_7"
I am trying to stack each of these data sets, in numerical order, on top of each other AND drop the underscore and integer.
However, if I use this code, all of the values are in chunks but along a diagonal.
data all;
set PT_BS1_all PT_BS2_all PT_BS3_all PT_BS4_all PT_BS5_all PT_BS6_all PT_BS7_all;
run;
The following code (written in SAS Studio) pretty much illustrates the problem and the "diagonal." However, in my actual data (working in SAS EG), all of the missing values are periods, regardless of variable type. In the example below, I could only get periods to appear for missing values for the numerical variables.
data have;
input study_id_1 $ variable1_1 $ variable2_1 variable3_1 study_id_2 $ variable1_2 $ variable2_2 variable3_2 study_id_3 $ variable1_3 $ variable2_3 variable3_3;
cards;
A treatment 35 24 . . . . . . . .
B placebo 24 44 . . . . . . . .
C treatment 66 77 . . . . . . . .
D placebo 73 45 . . . . . . . .
. . . . A treatment 23 34 . . . .
. . . . B placebo 43 56 . . . .
. . . . C treatment 34 34 . . . .
. . . . D placebo 54 67 . . . .
. . . . . . . . A treatment 22 66
. . . . . . . . B placebo 33 67
. . . . . . . . C treatment 23 48
. . . . . . . . D placebo 69 70
;
run;
proc print data=have;
run;
data want;
input study_id $ variable1 $ variable2 variable3;
cards;
A treatment 35 24
B placebo 24 44
C treatment 66 77
D placebo 73 45
A treatment 23 34
B placebo 43 56
C treatment 34 34
D placebo 54 67
A treatment 22 66
B placebo 33 67
C treatment 23 48
D placebo 69 70
;
run;
proc print data=want;
run;
I hope I've described the problem sufficiently and thanks for any help.
Upvotes: 0
Views: 252
Reputation: 27498
The first non-missing from a list of values is returned by the COALESCE
and COALESCEC
functions.
A list of variables is very simple in your data set because alike variables have a common prefix (and 1,2,3 suffixes). The syntax for specifying the alike variables is <prefix>:
Example:
data want;
set have;
* coalesce during stacking;
* set PT_BS1_all PT_BS2_all PT_BS3_all PT_BS4_all PT_BS5_all PT_BS6_all PT_BS7_all;
length study_id $8 variable1 $9;
study_id = coalesceC(of study_id_:);
variable1 = coalesceC(of variable1_:);
variable2 = coalesce (of variable2_:);
variable3 = coalesce (of variable3_:);
drop study_id_: variable1_: variable2_: variable3_:;
run;
Upvotes: 1
Reputation: 107567
Rather than clean up the compiled dataset output that are diagonal due to misaligned column names, adjust the inputs by appropriately renaming columns. Specifically, remove the suffix at underscore with scan
using a dynamic macro of oldname=newname
pattern built from proc sql
. Then pass this macro into a subsequent rename
command.
Below assumes all datasets resides in WORK
library. Adjust SQL WHERE
accordingly.
%macro rename_cols(dset);
proc sql noprint;
select cats(name,'=',scan(name, 1, '_'))
into :suffix_clean separated by ' '
from dictionary.columns
where libname = 'WORK' and memname = "&dset.";
quit;
data &dset;
set &dset;
rename &suffix_clean;
run;
%mend rename_cols;
%rename_cols(PT_BS1_ALL);
%rename_cols(PT_BS2_ALL);
%rename_cols(PT_BS3_ALL);
%rename_cols(PT_BS4_ALL);
%rename_cols(PT_BS5_ALL);
%rename_cols(PT_BS6_ALL);
%rename_cols(PT_BS7_ALL);
data all;
set PT_BS1_all
PT_BS2_all
PT_BS3_all
PT_BS4_all
PT_BS5_all
PT_BS6_all
PT_BS7_all;
run;
Upvotes: 0