Reputation: 331
I have a vector of strings that need cleaning. I have been able to clean it quite a lot on my own but I am having problems one thing.
Some strings have the chain '@56;' at the beginning (numbers vary). So a string can be '@56;trousers' or '@897;trousers' I would like to leave it just like 'trousers'.
I have written the following code:
gsub("[@[:digit:];]", "", 'mystring')
but it fails in cases like:
gsub("[@[:digit:];]", "", '@34skirt') # returns 'skirt'
I would like it to return '@34skirt' in this case because the ; is missing from the end.
I want a exact match. Any ideas about how to do this? I ahve tried to add \ and it does not work
Upvotes: 1
Views: 968
Reputation: 886948
We can try
sub("@\\d+;", "", v1)
#[1] "mystring" "@34skirt" "trousers" "trousers"
v1 <- c('mystring', '@34skirt', '@56;trousers', '@897;trousers')
Upvotes: 2
Reputation: 626690
The [@[:digit:];]
regex matches a single character that is either a @
, or a digit, or a ;
. Thus, it will remove those at any position in the string, as many times as it finds them with gsub
.
You may use a regex defining a sequence of characters to remove, not a character class:
@[0-9]+;
See the regex demo
You can even tell the regex engine to only remove those at the beginning of the string only:
^@[0-9]+;
sub("^@[0-9]+;", "", '@34skirt') ## [1] "@34skirt"
sub("^@[0-9]+;", "", '@34;trousers') ## [1] "trousers"
Upvotes: 2