Reputation: 3
I was trying to use inner_join
to merge two data.frames. The problem I faced is that the merging result only contains variables in one of the data.frames.
I expect to see two data frames are merged in a way that unmatched observations are dropped and variables in these two are still contained. The two data frames in my case are named cpds
and gtd
.
I am pretty sure that I have two unique identifiers (in my case, state
& year
) and these two variable names are the same in these two data frames. Though the result indeed drops all unmatched observations; however, it only contains variables in one of the data frames.
Here is my code:
library(dplyr)
terdemo <- inner_join(cpds,gtd)
Then R responds the following messages including a warning one.
Joining, by = c("country", "year")
Warning message:
In inner_join_impl(x, y, by$x, by$y, suffix$x, suffix$y) :
joining character vector and factor, coercing into character vector
Could anyone explain why this warning message is triggered ?
Note: I use RStudio Version 1.0.136 and mac OS Sierra Version 10.12.3. Related package is dplyr
.
Upvotes: 0
Views: 474
Reputation: 3
I found the answer to the problem I had. The command inner_join
has no problem at all, the reason why I could not find certain variables is that there are too many (>75) variables after merging so some of the variables could not be displayed when I use View()
. You could use name()
or summary()
to check all the variables you have after merging. Hope this help.
Upvotes: 0
Reputation: 521249
You should probably always explicitly join specifying the by
parameter of the join, i.e.
terdemo <- inner_join(cpds, gtd, by=c("state" = "state", "year" = "year"))
However, this should not have anything to do with your current observations. It is the behavior of inner_join()
that for each pair of join columns in the two data frames being joined, only one of them appears in the output. If you are perceiving columns being dropped, the the most likely explanation is that one or both of the join columns from one data frame have been omitted from the result data frame.
Upvotes: 1