Reputation: 109
Here is a sample dataframe:
a <- c("cat", "dog", "mouse")
b <- c("my cat is a tabby cat and is a friendly cat", "walk the dog", "the mouse is scared of the other mouse")
df <- data.frame(a,b)
I'd like to be able to remove the second occurrence of the value in col a in col b.
Here is my desired output:
a b
cat my cat is a tabby and is a friendly cat
dog walk the dog
mouse the mouse is scared of the other
I've tried different combinations of gsub and some stringr functions, but I haven't even gotten close to being able to remove the second (and only the second) occurrence of the string in col a in col b. I think I'm asking something similar to this one, but I'm not familiar with Perl and couldn't translate it to R.
Thanks!
Upvotes: 1
Views: 1199
Reputation: 5788
Base R, split-apply-combine solution:
# Split-apply-combine:
data.frame(do.call("rbind", lapply(split(df, df$a), function(x){
b <- paste(unique(unlist(strsplit(x$b, "\\s+"))), collapse = " ")
return(data.frame(a = x$a, b = b))
}
)
),
stringsAsFactors = FALSE, row.names = NULL
)
Data:
df <- data.frame(a = c("cat", "dog", "mouse"),
b = c("my cat is a tabby cat and is a friendly cat", "walk the dog", "the mouse is scared of the other mouse"),
stringsAsFactors = FALSE)
Upvotes: 0
Reputation: 18425
You could do this...
library(stringr)
df$b <- str_replace(df$b,
paste0("(.*?",df$a,".*?) ",df$a),
"\\1")
df
a b
1 cat my cat is a tabby and is a friendly cat
2 dog walk the dog
3 mouse the mouse is scared of the other
The regex finds the first string of characters with df$a
somewhere in it, followed by a space and another df$a
. The capture group is the text up to the space before the second occurrence (indicated by the (...)
), and the whole text (including the second occurrence) is replaced by the capture group \\1
(which has the effect of deleting the second df$a
and its preceding space). Anything after the second df$a
is not affected.
Upvotes: 0
Reputation: 109
I've actually found another solution that, though longer, may be clearer for other regex beginners:
library(stringr)
# Replace first instance of col a in col b with "INTERIM"
df$b <- str_replace(b, a, "INTERIM")
# Now that the original first instance of col a is re-labeled to "INTERIM", I can again replace the first instance of col a in col b, this time with an empty string
df$b <- str_replace(df$b, a, "")
# And I can re-replace the re-labeled "INTERIM" to the original string in col a
df$b <- str_replace(df$b, "INTERIM", a)
# Trim "double" whitespace
df$b <- str_replace(gsub("\\s+", " ", str_trim(df$b)), "B", "b")
df
a b
cat my cat is a tabby and is a friendly cat
dog walk the dog
mouse the mouse is scared of the other
Upvotes: 1
Reputation: 37651
It takes a little work to build the right Regex.
P1 = paste(a, collapse="|")
PAT = paste0("((", P1, ").*?)(\\2)")
sub(PAT, "\\1", b, perl=TRUE)
[1] "my cat is a tabby and is a friendly cat"
[2] "walk the dog"
[3] "the mouse is scared of the other "
Upvotes: 1