Sara A.
Sara A.

Reputation: 95

Delete identical columns in SAS

I am working with a very large dataset containing the same columns several times, but with different column names (both character and numeric). Does anyone know how to find and delete these identical columns?

Example

A    B      C     D     E    F    G
12   ab     12    ab    8    h   12
14   cd     14    cd    65   j   14
6    fs      6    fs    3    g    6
.     .      .     .    4    q    .
3     d      3     d    5    d    3

A-G are variable names, and I want to be able to see that A, C and G are identical and then remove all except one. Also B and D are identical. I want to keep only one.

Is this even possible?

Upvotes: 4

Views: 1132

Answers (1)

Tom
Tom

Reputation: 51566

Here is example using technique proposed by Shenglin Chen in the comments.

data have ;
 input A B $ C D $ E F $ G ;
cards;
12 ab 12 ab 8 h 12
14 cd 14 cd 65 j 14
6 fs 6 fs 3 g 6
. . . . 4 q .
3 d 3 d 5 d 3
;;;;

Find the unique numeric columns.

proc transpose data=have out=tall_numbers ;
  var _numeric_;
run;
proc sort data=tall_numbers nodupkey out=keep_numbers(keep=_name_); 
  by col: ;
run;

Find the unique character columns.

proc transpose data=have out=tall_characters ;
  var _character_;
run;
proc sort data=tall_characters nodupkey out=keep_characters(keep=_name_); 
  by col: ;
run;

Get the combined list of columns.

proc sql noprint ;
  select _name_ 
  into :keep_list separated by ' ' 
  from (select _name_ from keep_characters
  union select _name_ from keep_numbers)
  order by 1
  ;
quit;

Make new table with only the unique columns.

data want ; 
  set have ;
  keep &keep_list ;
run;

Upvotes: 2

Related Questions