SAS - Find and assign an ID base on all possible combinations of more variables

Question

I need to assign the same ID to every combination of same values in three variables and i really have no clue of what can I do to spot combination of the same three letters among the three variables (A-B-C should match with B-C-A) Here's my input data:

data HAVE;
input ID VAR1 VAR2 VAR3 $;
DATALINES;
001 A B C
002 A C B
003 B C A
004 A B 
005 B A 
006 D E F
007 E F D
008 F E D
009 E F 
010 F E
;
RUN;

And the resulting ID_NEW should be:

data HAVE;
input ID VAR1 VAR2 VAR3 $ ID_NEW;
DATALINES;
001 A B C 1
002 A C B 1
003 B C A 1
004 A B   2
005 B A   2
006 D E F 3
007 E F D 3
008 F E D 3
009 E F   4
010 F E   4
;
RUN;

I am able to spot combination of two by proc sql and performing a left join with keys t1.var1=t2.var2 and t1.var2=t2.var1 but it comes to spot the three letters combinations, I want to avoid the join because I could have 6 possible combinations and I feel that there's a smart way to so, without repeating the join 6 times! Perhaps with a combination of catt and scan functions?

Thank you in advance for your help :) !

Chris Long · Accepted Answer

You will be able to do this using the SORTC function, which sorts an array of character values into alphabetical order:

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a003106052.htm

One way would be to create a new variable that contains the values of VAR1-VAR3 in alphabetical order:

data want;
  length sorted_vars $ 20;
  set have;
  array vars[*] var1-var3;
  call sortc(of vars[*]);
  sorted_vars = cats(of vars[*]);
run;

The above code isn't tested but should be pretty close. From there, you can sort on sorted_vars and increment your id_new variable on each first.sorted_vars.

SAS - Find and assign an ID base on all possible combinations of more variables

Answers (2)

Related Questions