Joe
Joe

Reputation: 31

Using merge in sas

I am a little bit confused about merging in SAS. For example, when people are using the merge statement, sometimes (in=a) or (in=b) is followed. What does that do exactly?

Upvotes: 0

Views: 3532

Answers (3)

J_Lard
J_Lard

Reputation: 1103

To elaborate more on vknowles answer, in=a and in=b are useful in different types of merges. Let's say we have the following data step:

data inner left right outer;
  merge have1(in=a) have2(in=b);
  by ...;
  if a and b then output inner;
  else if a and not b then output left;
  else if not a and b then output right;
  else if not (a or b) then output miss;
run; 

The data step will create 4 different datasets which are the basis of an inner join, left join, and right join.

  • The statement if a and b then output inner; will output only records in which the key is found in the datasets have1 and have2 which is the equivalent of a SQL inner join.
  • else if a and not b then output left; will output only the records that occur in the have1 dataset and not the have2. This is the equivalent of a left outer join in SQL. If you wanted a full left join you could either append the left dataset to the inner dataset or just change the statement to if (a and b) or (a and not b) then output left.
  • The third else if is just the opposite of the previous. Here you can perform a right join on the data.
  • The last else if will output to the outer dataset which is the equivalent of an outer join. This is useful for debugging purposes as the key is unique to each dataset. Thanks to Robert for this addition.

Upvotes: 4

Tom
Tom

Reputation: 51621

When you see a dataset referenced in SAS with () following it the items inside are called dataset options. The IN= dataset option is valid when reading datasets using statements like SET, MERGE, UPDATE. The word following IN= names a variable that SAS will create that will be true (1) when that dataset contributes to the current observation and false (0) otherwise.

A good example would be if want to use one data set to subset another. For example if you wanted to merge on data from a master lookup table to add an extra variable like an address or account type , but did not what to add in every id from the lookup table.

data want;
   merge my_data(in=in1) master_lookup (in=in2);
   by id;
   if in1 ;
run;

Or if you are stacking or interleaving data from more than one table and wanted to take action depending on which table this record is from.

data want;
  set one(in=in1) two(in=in2);
  by id;
  if in1 then source='ONE';
  if in2 then source='TWO';
run;

Upvotes: 1

vknowles
vknowles

Reputation: 784

Let's say you have MERGE A (in=INA) B (in=INB);

When merging two datasets with a BY statement, you can have the following situations:

Dataset A has an observation with a given by-value and dataset B does not. In this case, INA will be true and INB will be false.

Dataset A and B both have an observation with a given by-value. In this case, INA and INB will be true.

[Edited to correct error] Dataset A and B have different numbers of observations with a given by-value. Let's say A has more than B. Even after B runs out of observations, both INA and INB will be true. If you want to know whether B still has observations, you need something like the following code. As @Tom pointed out, you often want to know which dataset is contributing observations.

data want;
  ina=0;
  inb=0;
  merge a (in=ina) b (in=inb);
  by mybyvariable;
  ...
run;

The above code takes advantage of the fact that SAS retains the variables from the last observation contributed to the by-group by the dataset with the smaller number of observations. If you reinitialize any of the variables before the MERGE statement, and there is no new observation, they will keep their reinitialized values; but if there is a new observation, SAS will reset them. I hope that makes sense.

Dataset A does not have an observation with a given by-value and B does. In this case, INA will be false and INB will be true.

Note that this does not cover the situation of merging without a BY statement.

Upvotes: 0

Related Questions