user3697665
user3697665

Reputation: 307

How to remove the last character of a string if it is a punctuation?

It suppose to be very simple. For example:

a = "fafasdf..", b = "sdfs?>", c = "safwe"

The result i want would be

a = "fafasdf", b = "sdfs", c = "safwe"

How do I remove the last few characters if they are punctuation? I tried sub("[:punct:]\Z", "", mystring), but it does not work...

Upvotes: 1

Views: 291

Answers (2)

hwnd
hwnd

Reputation: 70732

POSIX character classes need to be wrapped inside of a bracketed expression, the correct syntax would be [[:punct:]]. And, since you're not utilizing gsub to remove all instances, you need to specify an operator to match more than one occurrence of punctuation.

As commented in the other answer; the perl = TRUE parameter needs to be set to use \Z.

But for future reference — not to dissuade you, this anchor behaves differently depending on the engine being used; being said in R with the parameter set, this anchor will allow a match before a final line break. However, it's alright to use it here, but I would just stick to $ instead.

sub('[[:punct:]]+$', '', c('fafasdf..', 'sdfs?>', 'safwe'))
## [1] "fafasdf" "sdfs"    "safwe"

Also take into account the 'locale', it could affect the behavior of the POSIX class. If this becomes an issue, you can read up on this previously answered question.

If you're just wanting to removing non-word characters, you could just use:

sub('\\W+$', '', c('fafasdf..', 'sdfs?>', 'safwe'))
## [1] "fafasdf" "sdfs"    "safwe"

Upvotes: 2

Avinash Raj
Avinash Raj

Reputation: 174706

You're almost there,

sub("[[:punct:]]+$", "", mystring)

You need to put [:punct:] inside a character class and make it to repeat one or more times by adding + next to that. And also replace \Z with $, since sub without perl=TRUE param won't support \Z (which matches the end of the string boundary)

Example:

x <- c("fafasdf..", "sdfs?>", "safwe")
sub("[[:punct:]]+$", "", x)
# [1] "fafasdf" "sdfs"    "safwe"

If you really want to use \\Z, then enable perl=TRUE param.

sub("[[:punct:]]+\\Z", "", x, perl=TRUE)
# [1] "fafasdf" "sdfs"    "safwe" 

Upvotes: 4

Related Questions