lawyeR
lawyeR

Reputation: 7664

Remove multiple instances with a regex expression, but not the text in between instances

In long passages using bookdown, I have inserted numerous images. Having combined the passages into a single character string (in a data frame) I want to remove the markdown text associated with inserting images, but not any text in between those inserted images. Here is a toy example.

text.string <- "writing ![Stairway scene](/media/ClothesFairLady.jpg) writing to keep ![Second scene](/media/attire.jpg) more writing"

str_remove_all(string = text.string, pattern = "!\\[.+\\)")
[1] "writing  more writing"

The regex expression doesn't stop at the first closed parenthesis, it continues until the last one and deletes the "writing to keep" in between.

I tried to apply String manipulation in R: remove specific pattern in multiple places without removing text in between instances of the pattern, which uses gsubfn and gsub but was unable to get the solutions to work.

Please point me in the right direction to solve this problem of a regex removal of designated strings, but not the characters in between the strings. I would prefer a stringr solution, but whatever works. Thank you

Upvotes: 0

Views: 999

Answers (2)

Anoushiravan R
Anoushiravan R

Reputation: 21938

I think you could use the following solution too:

gsub("!\\[[^][]*\\]\\([^()]*\\)", "", text.string)

[1] "writing  writing to keep  more writing"

Upvotes: 0

koolmees
koolmees

Reputation: 2783

You have to use the following regex

"!\\[[^\\)]+\\)"

alternatively you can also use this:

"!\\[.*?\\)"

both solution offer a lazy match rather than a greedy one, which is the key to your question

Upvotes: 1

Related Questions