Steve Rowe
Steve Rowe

Reputation: 19413

How can I use join on columns with different names?

I am trying to use join from the plyr library, but my columns have different names. I am joining by country. One has the word Country and the other country (differing in case).

The command

foo <- join(ie, geo, by="Country")

gives me this error:

Error in [.data.frame(x, by) : undefined columns selected

How can I modify the by parameter to join the two different column names?

Upvotes: 3

Views: 2479

Answers (1)

James King
James King

Reputation: 6355

Based on the documentation this does not seem to be possible. As pointed out in the comment the base function merge will handle this with by.x = "Country" and by.y = "country", but merge is quite slow. I think the best option is to rename one of the columns (and change the name back after the join if you need to).

Also consider using the join functions from dplyr which are faster than those in plyr, for example

> system.time(x<-inner_join(baseball, baseball, by = "id"))
   user  system elapsed 
  0.037   0.000   0.037 
> system.time(x<-join(baseball, baseball, by = "id"))
   user  system elapsed 
  0.943   0.002   0.945 
> 

Upvotes: 1

Related Questions