Reputation: 59
I'm working with df1
that contains 180K rows and 27 factors. I used 3 factors in a merge with df2
(43K obs, 10 factors, 3 factors match with df1
). In df1
, the rows are observations repeated over months for several years for TrapLines
and TrapNum
. There are 3393 unique levels associated with TrapLines
& TrapNum
. In df2
, Shade
varies by month for TrapLine
and Trapnum
: for each TrapLine
/Trapnum
entry, there are 12 values of Shade
. In df2
, there are 3268 unique levels associated with Trapline
/Trapnum
. I want to merge (if that's the correct process) df1
and df2
such that the monthly Shade
data is associated with the correct Trapline
/TrapNum
in df1
.
df1:
TrapLine TrapNum Date Month_num
A24 D101 1-1-2018 1
A24 D101 2-13-2017 2
A24 D102 3-30-2017 3
A24 D102 4-10-2019 4
A24 D103 4-10-2019 4
A24 D103 5-15-2016 6
etc.
df2:
TrapLine TrapNum Shade Month Month_num
A24 D101 217 Jan 1
A24 D101 201 Feb 2
A24 D101 172 Mar 3
A24 D102 145 Mar 3
A24 D102 97 Apr 4
A24 D102 59 May 5
etc.
I used the following code:
df3 <- merge (df1, df2, by c("TrapLine", "TrapNum", "Month_num"))
However, in the process, I lose 40K observations from df1
! I'm at a loss to explain what happened: I would've thought that if there weren't matches between TrapLine
/TrapNum
that the Shade
values would've been set to NA
or the like, not that observations would be lost altogether. Can someone please help me understand what's happened and what process could be used such that I retain all of the observations (I'll worry about eliminating observations without Shade later!). Thank you for your time and attention.
Cheers, Doug
Upvotes: 1
Views: 842
Reputation: 2042
Use the argument all
of the merge
function. By default, all=FALSE
, so it does a "inner join", and drops all unmatched observations. If you set merge(..., all=TRUE)
you have a "outer join" and keep all observations. See https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/merge
Upvotes: 1