Reputation: 139
I have two datasets DA and DB, both contains the variable ssn and are sorted by ssn. How do I check whether DA contains some ssn that is not listed in DB and vice versa? I considered proc compare but it seems that the compare procedure can only compare the observations at the same position. For example, if
data DA;
input ssn &;
datalines;
100-00-0001
100-00-0003
100-00-0004
;
data DB;
input ssn &;
datalines;
100-00-0001
100-00-0002
100-00-0003
;
Then the results of compare will list 100-00-0003 and 100-00-0004 although both datasets contain 100-00-0003. How to solve this issue?
Upvotes: 0
Views: 1663
Reputation: 4554
Use proc sql:
proc sql;
select * from DA where ssn not in(select ssn from DB);
select * from DB where ssn not in(select ssn from DA);
quit;
Upvotes: 0
Reputation: 6378
You can use an ID
statement in PROC COMPARE. Untested:
proc compare base=da compare=db listobs ;
id SSN ;
run ;
There are options to output to a dataset as well.
Upvotes: 0
Reputation: 9569
Merge the two datasets together by SSN and output to different datasets, e.g.
data A_only B_only;
merge DA(in = a keep = SSN)
DB(in = b keep = SSN);
by SSN;
if a and not(b) then output a_only;
if b and not(a) then output b_only;
run;
Upvotes: 1