Reputation: 1217
Say I have a dataframe df in which a column df$strings contains strings like
[cat 00.04;09]
[cat 00.04;10]
and so on. I want to remove all characters between "[cat" and "]" to yield
[cat]
[cat]
I've tried this using gsub but it's not working and I'm not sure what I'm doing wrong:
gsub('cat*?\\]', '', df)
Upvotes: 1
Views: 480
Reputation: 626689
Note that cat*?\\]
patten matches ca
, then any 0+ t
chars but as few as possible and then ]
.
You want to match any chars other than ]
between [cat
and ]
:
gsub('\\[cat[^]]*\\]', '[cat]', df$strings)
Here,
\\[
- matches [
cat
- matches cat
[^]]*
- 0+ chars other than ]
(note that ]
inside the bracket expression should not be escaped when placed at the start - else, if you escape it, you will need to add perl=TRUE
argument since PCRE regex engine can handle regex escapes inside bracket expressions (not the default TRE))\\]
- a ]
(you do not even need to escape it, you may just use ]
).See the R demo:
x <- c("[cat 00.04;09]", "[cat 00.04;10]")
gsub('\\[cat[^]]*\\]', '[cat]', x)
## => [1] "[cat]" "[cat]"
If cat
can be any word, use
gsub('\\[(\\w+)[^]]*\\]', '[\\1]', x)
where (\\w+)
is a capturing group with ID=1 that matches 1 or more word chars, and \\1
in the replacement pattern is a replacement backreference that stands for the group value.
Upvotes: 4