Reputation: 2077
I'm struggling with joining two data sets
#df1
id name1
1 a
2 b
3 c
and
$df2
id name2
1 c
2 d
I try to join them by their id
library(dplyr)
result <- left_join(df1, df2, by="id")
it gives me the following error
Error: cannot join on columns 'id' x 'id': Can't join on 'id' x 'id' because of incompatible types (factor / integer)
because they have different classes:
sapply(df1, class)
id name1
"factor" "factor"
sapply(df2, class)
id name2
"integer" "factor"
I tried to change the classes to make them similar
df1$id <- as.integer (df1$id)
but , it doesn't help to find the common rows in two datasets. ( it can not recognize similar "id"s in df2)
Upvotes: 0
Views: 4462
Reputation: 195
I was running into the same problems with just going from characters to numeric and joining tables. I tried to go to numbers and it didn't work, even with the above method.
I had to go to as.integers(levels(df1$id))[df1$id]
to make it work.
I tried using as.numeric(levels(df1$id))[df1$id]
and it would take all of my values to NA.
Hope this helps!
Upvotes: 0
Reputation: 3597
From help page: as.numeric(levels(f))[f]
is recommended instead of as.numeric(as.character(f))
.
The issue with factor => numeric/integer conversion has been comprehensively answered by @Joshua Ulrich here.
Seek and ye shall find but user needs to know what to look for to reach the answer.
The Warning message in documentation for ?factor
The interpretation of a factor depends on both the codes and the "levels" attribute. Be careful only to compare factors with the same set of levels (in the same order). In particular, as.numeric applied to a factor is meaningless, and may happen by implicit coercion. To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f)).
This step could be avoided by ensuring stringsAsFactors=FALSE
while reading input data to side-step conversion of character variables to factors unless they are absolutely essential i.e. when levels
of factors are required in analysis.
Upvotes: 1