aiorr
aiorr

Reputation: 599

Replace character value using another dataframe with multiple matching

test.vector <- c("jdoe","John Doe","jodoe","Sarah Scarlet","sscarlet","scarlet")
test.df <- data.frame("Full.Name" = c("John Doe","Sarah Scarlet"),
                      "alias1" = c("jdoe","sscarlet"),
                      "alias2" = c("jodoe","scarlet"))
want.vector <- c("John Doe","John Doe","John Doe","Sarah Scarlet","Sarah Scarlet","Sarah Scarlet")

> test.vector
[1] "jdoe"          "John Doe"      "jodoe"         "Sarah Scarlet" "sscarlet"      "scarlet" 

> test.df
      Full.Name   alias1  alias2
1      John Doe     jdoe   jodoe
2 Sarah Scarlet sscarlet scarlet     

> want.vector
[1] "John Doe"      "John Doe"      "John Doe"      "Sarah Scarlet" "Sarah Scarlet" "Sarah Scarlet"

All the search results like this one have exactly one matching, and merge() or join() is used. However, in this case, there are multiple possibilities, and I am not sure how I can approach this. Few things I tried were (with butchered syntax):

  1. str_replace(test.vector,test.df[,-1],test.df[.1])
  2. recode(test.vector,test.df)
  3. join with by = c(test.df[,-1], test.vector) after changing test.vector into df

One thing to note is that the actual test.df I have for the project has multiple columns that are quite sparse (since each alias relates to a specific location/position). Not sure if it will cause significant difference with the example above.

Upvotes: 1

Views: 49

Answers (1)

jay.sf
jay.sf

Reputation: 73262

You could make an array of same dimensions as your data frame and let the first column recycle, then loop over the test vector to subset the array by the data frame in an sapply.

test.a <- array(test.df[, 1], dim=dim(test.df))
sapply(test.vector, function(x) test.a[x == test.df], USE.NAMES=F)
# [1] "John Doe"      "John Doe"      "John Doe"      "Sarah Scarlet" "Sarah Scarlet"
# [6] "Sarah Scarlet"

Upvotes: 1

Related Questions