R-obert
R-obert

Reputation: 1039

Replace values in data frame based on other data frame in R

In the below example, userids is my reference data frame and userdata is the data frame where the replacements should take place.

> userids <- data.frame(USER=c('Ann','Jim','Lee','Bob'),ID=c(1,2,3,4))
> userids
  USER ID
1  Ann  1
2  Jim  2
3  Lee  3
4  Bob  4

> userdata <- data.frame(INFO=c('foo','bar','foo','bar'), ID=c('Bob','Jim','Ann','Lee'),AGE=c('43','33','53','26'), FRIENDID=c('Ann',NA,'Lee','Jim'))
> userdata
  INFO  ID AGE FRIENDID
1  foo Bob  43      Ann
2  bar Jim  33       NA
3  foo Ann  53      Lee
4  bar Lee  26      Jim

How do I replace ID and FRIENDID in userdata with the ID corresponding to USER in userids?

The desired output:

  INFO  ID AGE FRIENDID
1  foo   4  43        1
2  bar   2  33       NA
3  foo   1  53        3
4  bar   3  26        2

Upvotes: 18

Views: 10576

Answers (4)

Umair Rafique
Umair Rafique

Reputation: 108

Here's a possible solution, which will also work on datasets with multiple records of each ID, though we will need to coerce the ID and FRIENDID variables to character first:

> userdata$ID <- sapply(userdata$ID, function(x){gsub(x, userids[userids$USER==x, 2], x)})
> userdata$FRIENDID <- sapply(userdata$FRIENDID, function(x){gsub(x, userids[userids$USER==x, 2], x)})

Upvotes: 0

Tyler Rinker
Tyler Rinker

Reputation: 110034

This is a possibility:

library(qdap)
userdata$FRIENDID <- lookup(userdata$FRIENDID, userids)
userdata$ID <- lookup(userdata$ID, userids)

or to win the one line prize:

userdata[, c(2, 4)] <- lapply(userdata[, c(2, 4)], lookup, key.match=userids)

Upvotes: 2

agstudy
agstudy

Reputation: 121608

Here a try using sqldf to get the result as a multiple join on differents columns.

  library(sqldf)
  sqldf('SELECT d.INFO,d.AGE,i1.ID ,i2.ID FRIENDID
       FROM 
       userdata d
       INNER JOIN 
       userids i1 ON (i1.USER=d.FRIENDID)
       INNER JOIN
        userids i2 ON (i2.USER=d.ID)')

 INFO AGE ID FRIENDID
1  foo  43  1        4
2  foo  53  3        1
3  bar  26  2        3

But this this removes NA lines! maybe someone can suggest me something on how to deal with NA!

EDIT

Thanks to G. Grothendieck comment, replacing the INNER by LEFT we get the result.

 sqldf('SELECT d.INFO,d.AGE,i1.ID ,i2.ID FRIENDID
        FROM 
        userdata d
        LEFT JOIN 
        userids i1 ON (i1.USER=d.FRIENDID)
        LEFT JOIN
         userids i2 ON (i2.USER=d.ID)')
INFO AGE ID FRIENDID
1  foo  43  1        4
2  bar  33 NA        2
3  foo  53  3        1
4  bar  26  2        3

Upvotes: 0

Arun
Arun

Reputation: 118879

Use match:

userdata$ID <- userids$ID[match(userdata$ID, userids$USER)]
userdata$FRIENDID <- userids$ID[match(userdata$FRIENDID, userids$USER)]

Upvotes: 26

Related Questions