Reputation: 10538
I have a df:
df <- data.frame(
x=c("ABC Inc", "DCV", "FGZ", "JH7 j11"),
y=c("ABC - fasjdlkjs", "DCV . (INC) .. kdhkfhksf", "FGZ / qiuwy72gs", "JH7 j11 dhd"),
target=c("fasjdlkjs", "inc kdhkfhksf", "qiuwy gs", "dhd")
)
Where x
is a close, but not exact subset of y
I want to gsub() everything in x
to ""
(blank) in y
, while also removing numbers/punctuation.
My desired output is stored in target
I thought this would have worked, but it didn't:
df <- mutate(target = gsub(pattern=x, replacement="", y))
EDIT:
Sort of: Y - X = Target
Upvotes: 1
Views: 106
Reputation: 32426
This (now - thanks @Frank) converts case tolower
. Below, s
builds the string to test against from the x
column by splitting the x
string by spaces
df$res <- mapply(function(a, b) {
s <- paste(c(unlist(strsplit(as.character(a)," ")), "[[:punct:]]"), collapse="|")
tolower(gsub("[[:digit:]]+", " ", gsub(s, "", b)))
}, df$x, df$y)
df
# x y target res
# 1 ABC Inc ABC - fasjdlkjs fasjdlkjs fasjdlkjs
# 2 DCV DCV . (INC) .. kdhkfhksf inc kdhkfhksf inc kdhkfhksf
# 3 FGZ FGZ / qiuwy72gs qiuwy gs qiuwy gs
# 4 JH7 j11 JH7 j11 dhd dhd dhd
Upvotes: 3