Reputation: 307
It suppose to be very simple. For example:
a = "fafasdf..", b = "sdfs?>", c = "safwe"
The result i want would be
a = "fafasdf", b = "sdfs", c = "safwe"
How do I remove the last few characters if they are punctuation?
I tried sub("[:punct:]\Z", "", mystring)
, but it does not work...
Upvotes: 1
Views: 291
Reputation: 70732
POSIX character classes need to be wrapped inside of a bracketed expression, the correct syntax would be [[:punct:]]
. And, since you're not utilizing gsub
to remove all instances, you need to specify an operator to match more than one occurrence of punctuation.
As commented in the other answer; the perl = TRUE
parameter needs to be set to use \Z
.
But for future reference — not to dissuade you, this anchor behaves differently depending on the engine being used; being said in R with the parameter set, this anchor will allow a match before a final line break. However, it's alright to use it here, but I would just stick to $
instead.
sub('[[:punct:]]+$', '', c('fafasdf..', 'sdfs?>', 'safwe'))
## [1] "fafasdf" "sdfs" "safwe"
Also take into account the 'locale', it could affect the behavior of the POSIX class. If this becomes an issue, you can read up on this previously answered question.
If you're just wanting to removing non-word characters, you could just use:
sub('\\W+$', '', c('fafasdf..', 'sdfs?>', 'safwe'))
## [1] "fafasdf" "sdfs" "safwe"
Upvotes: 2
Reputation: 174706
You're almost there,
sub("[[:punct:]]+$", "", mystring)
You need to put [:punct:]
inside a character class and make it to repeat one or more times by adding +
next to that. And also replace \Z
with $
, since sub without perl=TRUE
param won't support \Z
(which matches the end of the string boundary)
Example:
x <- c("fafasdf..", "sdfs?>", "safwe")
sub("[[:punct:]]+$", "", x)
# [1] "fafasdf" "sdfs" "safwe"
If you really want to use \\Z
, then enable perl=TRUE
param.
sub("[[:punct:]]+\\Z", "", x, perl=TRUE)
# [1] "fafasdf" "sdfs" "safwe"
Upvotes: 4