MFR
MFR

Reputation: 2077

Joining two datasets with different classes

I'm struggling with joining two data sets

 #df1

  id   name1
   1    a
   2    b
   3    c

and

   $df2

  id     name2
  1       c
  2       d

I try to join them by their id

  library(dplyr)


 result <- left_join(df1, df2, by="id")

it gives me the following error

Error: cannot join on columns 'id' x 'id': Can't join on 'id' x 'id' because of incompatible types (factor / integer)

because they have different classes:

 sapply(df1, class)
        id       name1
     "factor"       "factor"    


sapply(df2, class)
        id       name2
     "integer"       "factor"

I tried to change the classes to make them similar

 df1$id <- as.integer (df1$id)

but , it doesn't help to find the common rows in two datasets. ( it can not recognize similar "id"s in df2)

Upvotes: 0

Views: 4462

Answers (2)

Luke Holcomb
Luke Holcomb

Reputation: 195

I was running into the same problems with just going from characters to numeric and joining tables. I tried to go to numbers and it didn't work, even with the above method.

I had to go to as.integers(levels(df1$id))[df1$id] to make it work.

I tried using as.numeric(levels(df1$id))[df1$id] and it would take all of my values to NA.

Hope this helps!

Upvotes: 0

Silence Dogood
Silence Dogood

Reputation: 3597

From help page: as.numeric(levels(f))[f] is recommended instead of as.numeric(as.character(f)).

The issue with factor => numeric/integer conversion has been comprehensively answered by @Joshua Ulrich here.

Seek and ye shall find but user needs to know what to look for to reach the answer.

The Warning message in documentation for ?factor

The interpretation of a factor depends on both the codes and the "levels" attribute. Be careful only to compare factors with the same set of levels (in the same order). In particular, as.numeric applied to a factor is meaningless, and may happen by implicit coercion. To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f)).

This step could be avoided by ensuring stringsAsFactors=FALSE while reading input data to side-step conversion of character variables to factors unless they are absolutely essential i.e. when levels of factors are required in analysis.

Upvotes: 1

Related Questions