DroppingOff
DroppingOff

Reputation: 331

replace exact string match with regexp in R

I have a vector of strings that need cleaning. I have been able to clean it quite a lot on my own but I am having problems one thing.

Some strings have the chain '@56;' at the beginning (numbers vary). So a string can be '@56;trousers' or '@897;trousers' I would like to leave it just like 'trousers'.

I have written the following code:

gsub("[@[:digit:];]", "", 'mystring')   

but it fails in cases like:

gsub("[@[:digit:];]", "", '@34skirt') # returns 'skirt'

I would like it to return '@34skirt' in this case because the ; is missing from the end.

I want a exact match. Any ideas about how to do this? I ahve tried to add \ and it does not work

Upvotes: 1

Views: 968

Answers (2)

akrun
akrun

Reputation: 886948

We can try

sub("@\\d+;", "", v1)
#[1] "mystring" "@34skirt" "trousers" "trousers"

data

v1 <- c('mystring', '@34skirt',  '@56;trousers', '@897;trousers') 

Upvotes: 2

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626690

The [@[:digit:];] regex matches a single character that is either a @, or a digit, or a ;. Thus, it will remove those at any position in the string, as many times as it finds them with gsub.

You may use a regex defining a sequence of characters to remove, not a character class:

@[0-9]+;

See the regex demo

You can even tell the regex engine to only remove those at the beginning of the string only:

^@[0-9]+;

Sample demo:

sub("^@[0-9]+;", "", '@34skirt')     ## [1] "@34skirt"
sub("^@[0-9]+;", "", '@34;trousers') ## [1] "trousers"

Upvotes: 2

Related Questions