Reputation: 2989
I have a data frame match_df
which shows "matching rules": the column old
should be replaced with the colum new
in the dataframes it is applied on.
old <- c("10000","20000","300ZZ","40000")
new <- c("Name1","Name2","Name3","Name4")
match_df <- data.frame(old,new)
old new
1 10000 Name1
2 20000 Name2
3 300ZZ Name3 # watch the letters
4 40000 Name4
I want to apply the matching rules above on a data frame working_df
id <- c(1,2,3,4)
value <- c("xyz-10000","20000","300ZZ-230002112","40")
working_df <- data.frame(id,value)
id value
1 1 xyz-10000
2 2 20000
3 3 300ZZ-230002112
4 4 40
My desired result is
# result
id value
1 1 Name1
2 2 Name2
3 3 Name3
4 4 40
This means that I am not looking for an exact match. I'd rather like to replace the whole string working_df$value
as soon as it includes any part of the string in match_df$old
.
I like the solution posted in R: replace characters using gsub, how to create a function?, but it works only for exact matches. I experimented with gsub
, str_replace_all
from stringr
but I couldn't find a solution that works for me. There are many solutions for exact matches on SOF, but I couldn't find a comprehensible one for this problem.
Any help is highly appreciated.
Upvotes: 0
Views: 152
Reputation: 109924
Here are 2 approaches using Map
+ <<-
and a for
loop:
working_df[["value2"]] <- as.character(working_df[["value"]])
Map(function(x, y){working_df[["value2"]][grepl(x, working_df[["value2"]])] <<- y}, old, new)
working_df
## id value value2
## 1 1 xyz-10000 Name1
## 2 2 20000 Name2
## 3 3 300ZZ-230002112 Name3
## 4 4 40 40
## or...
working_df[["value2"]] <- as.character(working_df[["value"]])
for (i in seq_along(working_df[["value2"]])) {
working_df[["value2"]][grepl(old[i], working_df[["value2"]])] <- new[i]
}
Upvotes: 0
Reputation: 21425
I'm not sure this is the most elegant/efficient way of doing it but you could try something like this:
working_df$value <- sapply(working_df$value,function(y){
idx<-which(sapply(match_df$old,function(x){grepl(x,y)}))[1]
if(is.na(idx)) idx<-0
ifelse(idx>0,as.character(match_df$new[idx]),as.character(y))
})
It uses grepl
to find, for each value of working_df
, if there is a row of match_df
that is partially matching and get the index of that row. If there is more than one, it takes the first one.
Upvotes: 1
Reputation: 7288
You need the grep
function. This will return the indices of a vector that match a pattern (any pattern, not necessarily a full string match). For instance, this will tell you which of your "old" values match the "10000" pattern:
grep(match_df[1,1], working_df$value)
Once you have that information, you can look up the corresponding "new" value for that pattern, and replace it on the matching rows.
Upvotes: 0