Reputation: 171
I'm a little confused by what the following in= step does.
Here is the code:
data data1;
merge data2 data3 (in=inb);
by ID;
if inb;
run;
I would really appreciate if someone can tell me what the in=inb here does.
Upvotes: 1
Views: 6489
Reputation: 9569
One other aspect of the behaviour of the in=
option that I don't think anyone else has mentioned - if you merge two different datasets using the same in=
variable for both, and a row is in one but not the other, a value of 1 takes precedence over a value of 0. E.g.
data test;
merge sashelp.class(where = (sex = 'F') in = a)
sashelp.class(where = (sex = 'M') in = a);
by name;
put _all_;
run;
In this case, a = 1 for every row, even though each row is only present in one of the input datasets.
Upvotes: 1
Reputation: 36
Functionally,
merge data2 data3 (in=inb);
by ID;
if inb;
is the same as a right join in SQL.
Technically, "inb" is a 0/1 flag set to "1" for each record found in data3. "if inb" is shorthand for "if inb is true [then keep the record]", and for numeric fields "true" means greater than zero.
Upvotes: 0
Reputation: 669
because the option (in=inb) is after data3, it is referred to this dataset. hence, you will have a boolean variable that will be 1 in data1 (final dataset) if that observation was present in data3, 0 otherwise.
Data2 Data3
ID ID
A A
B B
C D
You will have
Data3
ID INB
A 1
B 1
C 0
D 1
Adding the statement if INB; you will keep only observations with INB=1 (observations coming from data3)
Data3
ID
A
B
D
Upvotes: 0
Reputation: 669
DS_A DS_B
ID VAR1 ID VAR2
A X A X
B X B X
C X D X
data want;
merge ds_a ds_b;
by id;
run;
will produce this
WANT:
ID VAR1 VAR2
A X X
B X X
C X
D X
If you add the IN= option you add a temporary and hidden variable that is 1 when the observation is present in that dataset, 0 otherwise, like this:
DS_A DS_B
ID VAR1 ID VAR2
A X A X
B X B X
C X D X
data want;
merge ds_a (in=frs) ds_b (in=scn);
by id;
run;
WANT:
ID VAR1 VAR2 FRS SCN
A X X 1 1
B X X 1 1
C X 1 0
D X 0 1
So you can play with this hidden variable to keep observations from one dataset or from both or from only one etc...
if frs; ---> keep ID=A B C
if scn; ---> keep ID=A B D
if frs and scn ---> keep ID=A B
if frs and not scn --> keep ID=C
etc..
Upvotes: 4
Reputation: 1052
According to SYNTAX
section of the merge
Statement documentation, the data sets you are merging can have options. In this case you are using IN= Data Set Option. Below is the explanation of this option:
Creates a Boolean variable that indicates whether the data set contributed data to the current observation.
So in this case, you are naming this boolean variable inb
.
Upvotes: 0