Hey Lyla
Hey Lyla

Reputation: 45

SAS - Find and assign an ID base on all possible combinations of more variables

I need to assign the same ID to every combination of same values in three variables and i really have no clue of what can I do to spot combination of the same three letters among the three variables (A-B-C should match with B-C-A) Here's my input data:

data HAVE;
input ID VAR1 VAR2 VAR3 $;
DATALINES;
001 A B C
002 A C B
003 B C A
004 A B 
005 B A 
006 D E F
007 E F D
008 F E D
009 E F 
010 F E
;
RUN;

And the resulting ID_NEW should be:

data HAVE;
input ID VAR1 VAR2 VAR3 $ ID_NEW;
DATALINES;
001 A B C 1
002 A C B 1
003 B C A 1
004 A B   2
005 B A   2
006 D E F 3
007 E F D 3
008 F E D 3
009 E F   4
010 F E   4
;
RUN;

I am able to spot combination of two by proc sql and performing a left join with keys t1.var1=t2.var2 and t1.var2=t2.var1 but it comes to spot the three letters combinations, I want to avoid the join because I could have 6 possible combinations and I feel that there's a smart way to so, without repeating the join 6 times! Perhaps with a combination of catt and scan functions?

Thank you in advance for your help :) !

Upvotes: 1

Views: 437

Answers (2)

Richard
Richard

Reputation: 27508

ID_NEW is based on combination. Each unique combination can be thought of as a set element. The set element is arrived at by sorting the data values (which converts a values permutation to a distinct permutation representing the values combination) and concatenating them with delimiters into a single value that is a set element. The delimiting ensures a separation of contributing value in the element. A set element can also be thought of as a hash key, and there in comes the single-pass solution.

Example

perm     sort     element/key
A B C -> A B C -> A,B,C
B C A -> A B C -> A,B,C

CALL SORTC will sort variables, but you do not want to sort the actual variables, so the values have to be copied into other variables that will be sorted (and dropped from output)

A hash can associate a key value with a combination_id. The combo_id is incremented each time a new key value is encountered, or retrieved when the key already exists.

Example:

data HAVE;
input ID VAR1 $ VAR2 $ VAR3 $;
DATALINES;
001 A B C
002 A C B
003 B C A
004 A B .
005 B A .
006 D E F
007 E F D
008 F E D
009 E F .
010 F E .
011 C A B
012 E D F
RUN;

data want;
  if 0 then set have; * prep pdv;

  length _key $100 combo_id 8; * host variables for hash;

  if _n_ = 1 then do;
     declare hash ids();
     ids.defineKey ('_key');
     ids.defineData ('combo_id');
     ids.defineDone();
  end;

  set have;

  * copy data into variables that will be sorted;

  * combination of raw data;
  * use cats in case some of variables are numeric;
  _item1 = cats(var1);
  _item2 = cats(var2);
  _item3 = cats(var3);

  * alternative if combo_id is to be based on formatted data values;
  * use vvalue to obtained formatted representation of data values;
  * _item1 = vvalue(var1);
  * _item2 = vvalue(var2);
  * _item3 = vvalue(var3);

  array items _item1-_item3;  * arrayify the _items so they can be sorted;

  call sortc(of items(*));

  * delimit with hex 255, a character not likely to occur
  * in the data value or formatted representation;

  _key = catx('FF'x, of items(*)); 

  * retrieve or increment the combo_id;

  if ids.find() ne 0 then do;
    combo_id = ids.num_items + 1;
    ids.add();
  end;

  drop _:;
run;

Upvotes: 1

Chris Long
Chris Long

Reputation: 1319

You will be able to do this using the SORTC function, which sorts an array of character values into alphabetical order:

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a003106052.htm

One way would be to create a new variable that contains the values of VAR1-VAR3 in alphabetical order:

data want;
  length sorted_vars $ 20;
  set have;
  array vars[*] var1-var3;
  call sortc(of vars[*]);
  sorted_vars = cats(of vars[*]);
run;

The above code isn't tested but should be pretty close. From there, you can sort on sorted_vars and increment your id_new variable on each first.sorted_vars.

Upvotes: 0

Related Questions