user3305018
user3305018

Reputation: 57

Compare each row of one dataset with another dataset

Just a general question lets say I have two datasets called dataset1 and dataset2 and If I want to compare the rows of dataset1 with the complete dataset2 so essentially compare each row of dataset1 with dataset2. Below is just an example of the two datasets

Dataset1

EmployeeID  Name    Employeer
12345   John    Microsoft
1234567 Alice   SAS
1234565 Jim IBM

Dataset1

EmployeeID2 Name    DateAbsent
12345   John    25/06/2009
12345   John    26/06/2009
1234567 Alice   27/06/2010
1234567 Alice   30/06/2011
1234567 Alice   2/8/2012
12345   John    28/06/2009
12345   John    25/07/2009
12345   John    25/08/2009
1234565 Jim 26/08/2009
1234565 Jim 27/08/2010
1234565 Jim 28/08/2011
1234565 Jim 29/08/2012

I have written some programming logic its not sas code, this is just my logic

for item in dataset1:
   for item2 in dataset2:
     if item.EmployeeID=item2.EmployeeID2 and item.Name=item2.Name then output newSet

Upvotes: 0

Views: 1572

Answers (3)

J  Calbreath
J Calbreath

Reputation: 2705

Is your goal to compare the two dataset and see where there are differences? Proc Compare will do this for you. You can compare specific columns or the entire dataset.

Upvotes: 0

DomPazz
DomPazz

Reputation: 12465

To do this in a Data step, the data sets need to be sorted by the variables to join on (or indexed). Also the variable names need to be the same, so I will assume both variables are EmployeeID.

/*sort*/
proc sort data=dataset1;
by EmployeeID Name;
run;

proc sort data=dataset2;
by EmployeeID Name;
run;

data output;
merge dataset1 (in=ds1) dataset2 (inds2);
by EmployeeID Name;
if ds1 and ds2;
run;

The data step does the loop for you. It needs sorted sets because it only takes 1 pass over the data sets. The if clause checks to make sure you are getting a value from both data sets.

Upvotes: 1

DomPazz
DomPazz

Reputation: 12465

This is an inner join.

proc sql noprint;
create table output as
select a.EmployeeId,
       a.Name,
       a.Employeer,
       b.DateAbsent
   from dataset1 as a
     inner join
        dataset2 as b
     on a.EmployeeID = b.EmployeeID2
    and a.Name = b.name;
quit;

I recommend reading the SAS documentation on PROC SQL if you are unfamiliar with the syntax

Upvotes: 1

Related Questions