burton030
burton030

Reputation: 405

Comparison of two data sets in SAS

I have the following data set:

data data_one;
length X 3
Y $ 20; 

input x y ;

datalines;
1 test
2 test
3 test1
4 test1
5 test
6 test
7 test1

    run;

data data_two;
length Z 3
       A $ 20;

input Z A;

datalines;
1 test
2 test1
3 test2
run;

What I would like to have is a data set which tells me how often column Y in data_one contains the same string of column A in data_two. The result should look like this one:

 Obs    test    test1    test2

  1       4       3        0

Thanks in advance!

Upvotes: 1

Views: 76

Answers (1)

DaBigNikoladze
DaBigNikoladze

Reputation: 661

  1. First we need the counts for those values of Y present in data_one.
  2. Then we create a sorted (for the next merge) list of the values present in data_two.
  3. The data_one Y counts from 1. are merged with the list from 2. The Y values present in data_two but not in data_one (b and not a) are assigned count=0, the Y values not present in data_two are discarded (if b).
  4. The last passage transposes the vertical list of counts in an horizontal set of variables.

proc freq data=data_one noprint;
    table y / out=count_one (keep=y count);
run;
proc sort data=data_two out=list_two (keep=a rename=(a=y)) nodupkey;
    by a;
run;
data count_all;
    merge count_one (in=a) list_two (in=b);
    by y;
    if (b and not a) then count=0;
    if b;
run;
proc transpose data=count_all out=final (drop=_name_ _label_);
    id y;
run;

The first 3 steps can be replaced with one proc SQL:

proc sql;
    create table count_all as
    select distinct
            coalesce(t1.y,t2.a) as y,
            case
                when missing(t1.y) then 0 
                else count(t1.y)
            end as N
        from data_one as t1
        right join data_two as t2
            on t1.y=t2.a
        group by 1
        order by 1;
quit;
proc transpose data=count_all out=final (drop=_name_);
    id y;
run;

Upvotes: 1

Related Questions