remove all characters between string and bracket in R

Question

Say I have a dataframe df in which a column df$strings contains strings like

[cat 00.04;09]
[cat 00.04;10]

and so on. I want to remove all characters between "[cat" and "]" to yield

[cat]
[cat]

I've tried this using gsub but it's not working and I'm not sure what I'm doing wrong:

gsub('cat*?\]', '', df)

Wiktor Stribiżew · Accepted Answer

Note that cat*?\] patten matches ca, then any 0+ t chars but as few as possible and then ].

You want to match any chars other than ] between [cat and ]:

gsub('$$cat[^]]*$$', '[cat]', df$strings)

Here,

$$ - matches [
cat - matches cat
[^]]* - 0+ chars other than ] (note that ] inside the bracket expression should not be escaped when placed at the start - else, if you escape it, you will need to add perl=TRUE argument since PCRE regex engine can handle regex escapes inside bracket expressions (not the default TRE))
$$ - a ] (you do not even need to escape it, you may just use ]).

See the R demo:

x <- c("[cat 00.04;09]", "[cat 00.04;10]")
gsub('$$cat[^]]*$$', '[cat]', x)
## => [1] "[cat]" "[cat]"

If cat can be any word, use

gsub('$$(\w+)[^]]*$$', '[\1]', x)

where (\w+) is a capturing group with ID=1 that matches 1 or more word chars, and \1 in the replacement pattern is a replacement backreference that stands for the group value.

remove all characters between string and bracket in R

Answers (1)

Related Questions