Reputation: 405
I have the following data set:
data data_one;
length X 3
Y $ 20;
input x y ;
datalines;
1 test
2 test
3 test1
4 test1
5 test
6 test
7 test1
run;
data data_two;
length Z 3
A $ 20;
input Z A;
datalines;
1 test
2 test1
3 test2
run;
What I would like to have is a data set which tells me how often column Y in data_one contains the same string of column A in data_two. The result should look like this one:
Obs test test1 test2
1 4 3 0
Thanks in advance!
Upvotes: 1
Views: 76
Reputation: 661
(b and not a)
are assigned count=0, the Y values not present in data_two are discarded (if b
).proc freq data=data_one noprint;
table y / out=count_one (keep=y count);
run;
proc sort data=data_two out=list_two (keep=a rename=(a=y)) nodupkey;
by a;
run;
data count_all;
merge count_one (in=a) list_two (in=b);
by y;
if (b and not a) then count=0;
if b;
run;
proc transpose data=count_all out=final (drop=_name_ _label_);
id y;
run;
The first 3 steps can be replaced with one proc SQL:
proc sql;
create table count_all as
select distinct
coalesce(t1.y,t2.a) as y,
case
when missing(t1.y) then 0
else count(t1.y)
end as N
from data_one as t1
right join data_two as t2
on t1.y=t2.a
group by 1
order by 1;
quit;
proc transpose data=count_all out=final (drop=_name_);
id y;
run;
Upvotes: 1