mustafghan
mustafghan

Reputation: 171

about data merge: the in= option in SAS

I'm a little confused by what the following in= step does.

Here is the code:

data data1;
merge data2 data3 (in=inb);
by ID;
if inb;
run; 

I would really appreciate if someone can tell me what the in=inb here does.

Upvotes: 1

Views: 6489

Answers (5)

user667489
user667489

Reputation: 9569

One other aspect of the behaviour of the in= option that I don't think anyone else has mentioned - if you merge two different datasets using the same in= variable for both, and a row is in one but not the other, a value of 1 takes precedence over a value of 0. E.g.

data test;
    merge sashelp.class(where = (sex = 'F') in = a) 
          sashelp.class(where = (sex = 'M') in = a);
    by name;          
    put _all_;
run;    

In this case, a = 1 for every row, even though each row is only present in one of the input datasets.

Upvotes: 1

Doug Dame
Doug Dame

Reputation: 36

Functionally,

 merge data2 data3 (in=inb);
 by ID;
    if inb;

is the same as a right join in SQL.

Technically, "inb" is a 0/1 flag set to "1" for each record found in data3. "if inb" is shorthand for "if inb is true [then keep the record]", and for numeric fields "true" means greater than zero.

Upvotes: 0

stat
stat

Reputation: 669

because the option (in=inb) is after data3, it is referred to this dataset. hence, you will have a boolean variable that will be 1 in data1 (final dataset) if that observation was present in data3, 0 otherwise.

Data2 Data3
ID       ID
A        A
B        B
C        D

You will have

Data3
ID INB
A  1
B  1
C  0
D  1

Adding the statement if INB; you will keep only observations with INB=1 (observations coming from data3)

Data3
ID
A
B
D

Upvotes: 0

stat
stat

Reputation: 669

DS_A         DS_B

ID VAR1      ID VAR2
A   X        A   X
B   X        B   X
C   X        D   X

data want;
merge ds_a ds_b;
by id;
run;

will produce this

WANT:

ID VAR1 VAR2
A   X    X
B   X    X
C   X  
D        X

If you add the IN= option you add a temporary and hidden variable that is 1 when the observation is present in that dataset, 0 otherwise, like this:

DS_A         DS_B

ID VAR1      ID VAR2
A   X        A   X
B   X        B   X
C   X        D   X

data want;
merge ds_a (in=frs) ds_b (in=scn);
by id;
run;

WANT:
ID VAR1 VAR2  FRS  SCN
A   X    X     1    1
B   X    X     1    1
C   X          1    0
D        X     0    1

So you can play with this hidden variable to keep observations from one dataset or from both or from only one etc...

if frs; ---> keep ID=A B C
if scn; ---> keep ID=A B D
if frs and scn ---> keep ID=A B
if frs and not scn --> keep ID=C
etc..

Upvotes: 4

D. Josefsson
D. Josefsson

Reputation: 1052

According to SYNTAX section of the merge Statement documentation, the data sets you are merging can have options. In this case you are using IN= Data Set Option. Below is the explanation of this option:

Creates a Boolean variable that indicates whether the data set contributed data to the current observation.

So in this case, you are naming this boolean variable inb.

Upvotes: 0

Related Questions