Reputation: 599
test.vector <- c("jdoe","John Doe","jodoe","Sarah Scarlet","sscarlet","scarlet")
test.df <- data.frame("Full.Name" = c("John Doe","Sarah Scarlet"),
"alias1" = c("jdoe","sscarlet"),
"alias2" = c("jodoe","scarlet"))
want.vector <- c("John Doe","John Doe","John Doe","Sarah Scarlet","Sarah Scarlet","Sarah Scarlet")
> test.vector
[1] "jdoe" "John Doe" "jodoe" "Sarah Scarlet" "sscarlet" "scarlet"
> test.df
Full.Name alias1 alias2
1 John Doe jdoe jodoe
2 Sarah Scarlet sscarlet scarlet
> want.vector
[1] "John Doe" "John Doe" "John Doe" "Sarah Scarlet" "Sarah Scarlet" "Sarah Scarlet"
All the search results like this one have exactly one matching, and merge()
or join()
is used.
However, in this case, there are multiple possibilities, and I am not sure how I can approach this.
Few things I tried were (with butchered syntax):
str_replace(test.vector,test.df[,-1],test.df[.1])
recode(test.vector,test.df)
by = c(test.df[,-1], test.vector)
after changing test.vector into dfOne thing to note is that the actual test.df
I have for the project has multiple columns that are quite sparse (since each alias relates to a specific location/position). Not sure if it will cause significant difference with the example above.
Upvotes: 1
Views: 49
Reputation: 73262
You could make an array of same dim
ensions as your data frame and let the first column recycle, then loop over the test vector to subset the array by the data frame in an sapply
.
test.a <- array(test.df[, 1], dim=dim(test.df))
sapply(test.vector, function(x) test.a[x == test.df], USE.NAMES=F)
# [1] "John Doe" "John Doe" "John Doe" "Sarah Scarlet" "Sarah Scarlet"
# [6] "Sarah Scarlet"
Upvotes: 1