green2010
green2010

Reputation: 21

SAS: comparisons across multiple columns for pairs of IDs

I am working with data that derives from an 'indicate all that apply' question. Two raters were asked to complete the question for a unique subject list. The data looks something like this.

ID| Rater|Q1A|Q1B|Q1C|Q1D
------------------------
1 | 1    | A | F | E | B
1 | 2    | E | G | 
2 | 1    | D | C | A
2 | 2    | C | D | A

I want to compare the two raters' answers for each ID and determine whether answers for Q1A-Q1D are the same. I am not interested in the direct comparisons between each rater by ID for Q1A, Q1B, etc. individually. I want to know if all the values in Q1A-Q1D as a set are the same. (E.g., in the example data above, the raters for ID 2 would be identical). I am assuming I would do this with an array. Thanks.

Upvotes: 2

Views: 118

Answers (3)

Tom
Tom

Reputation: 51566

Sort, Concatenate, then compare.

data want ;
  set ratings;
  by id;
  call sortc(of Q1A -- Q1D);
  rating = cats(of Q1A -- Q1D);
  retain rater1 rating1 ;
  if first.id then rater1=rater;
  if first.id then rating1=rating;
  if not first.id ;
  rater2 = rater ;
  rating2 = rating;
  match = rating1=rating2 ;
  keep id rater1 rater2 rating1 rating2 match;
run;

Upvotes: 0

user667489
user667489

Reputation: 9569

This looks like a job for call sortc:

data have;
infile cards missover;
input ID  Rater (Q1A Q1B Q1C Q1D) ($);
cards;
1   1      A   F   E   B
1   2      E   G   
2   1      D   C   A
2   2      C   D   A
3   1      A   B   C
3   2      A   B   D
;
run;

/*You can use an array if you like, but this works fine too*/
data temp /view = temp;
  set have;
  call sortc(of q:);
run;

data want;
  set temp;
  /*If you have more questions, extend the double-dash list to cover all of them*/
  by ID Q1A--Q1D notsorted;
  /*Replace Q1D with the name of the variable for the last question*/
  IDENTICAL_RATERS = not(first.Q1D and last.Q1D);
run;

Upvotes: 1

Dominic Comtois
Dominic Comtois

Reputation: 10401

Here is a similar solution also using call sortc, but rather using vectors and retain variables.

Create example dataset

data ratings;
  infile datalines truncover;
  input ID Rater (Q1A Q1B Q1C Q1D) ($);
  datalines;
1 1 A F E B
1 2 E G
2 1 D C A
2 2 C D A
3 1 A B C
3 2 A B D
;

Do the comparison

data compare(keep=ID EQUAL);
  set ratings;
  by ID;
  format PREV_1A PREV_Q1B PREV_Q1C PREV_Q1D $1.
         EQUAL 1.;
  retain PREV_:;
  call sortc(of Q1:);
  array Q(4) Q1:;
  array PREV(4) PREV_:;
  if first.ID then do;
    do _i = 1 to 4;
      PREV(_i) = Q(_i);
    end;
  end;
  else do;
    EQUAL = 1;
    do _i = 1 to 4;
      if Q(_i) NE PREV(_i) then EQUAL = 0;
    end;
    output;
  end;
run;

Results

ID EQUAL 
1  0 
2  1 
3  0 

Upvotes: 1

Related Questions