lorenzbr
lorenzbr

Reputation: 181

R Remove specific character with range of possible positions within string

I would like to remove the character 'V' (always the last one in the strings) from the following vector containing a large number of strings. They look similar to the following example:

str <- c("VDM 000 V2.1.1",
         "ABVC 001 V10.15.0",
         "ASDV 123 V1.20.0")

I know that it is always the last 'V', I would like to remove. I also know that this character is either the sixth, seventh or eighth last character within these strings.

I was not really able to come up with a nice solution. I know that I have to use sub or gsub but I can only remove all V's rather than only the last one.

Has anyone got an idea?

Thank you!

Upvotes: 3

Views: 1145

Answers (3)

akuiper
akuiper

Reputation: 214927

Since you know it's the last V you want to remove from the string, try this regex V(?=[^V]*$):

gsub("V(?=[^V]*$)", "", str, perl = TRUE)
# [1] "VDM 000 2.1.1"    "ABVC 001 10.15.0" "ASDV 123 1.20.0" 

The regex matches V before pattern [^V]*$ which consists of non V characters from the end of the String, which guarantees that the matched V is the last V in the string.

Upvotes: 1

aichao
aichao

Reputation: 7435

You can use:

gsub("V(\\d+.\\d+.\\d+)$","\\1",str)
##[1] "VDM 000 2.1.1"    "ABVC 001 10.15.0" "ASDV 123 1.20.0" 

The regex V(\\d+.\\d+.\\d+)$ matches the "version" consisting of the character "V" followed by three sets of digits (i.e., \\d+) separated by two "." at the end of the string (i.e., $). The parenthesis around the \\d+.\\d+.\\d+ provides a group within the match that can be referenced by \\1. Therefore, gsub will replace the whole match with the group, thereby removing that "V".

Upvotes: 2

IRTFM
IRTFM

Reputation: 263311

This regex pattern is written to match a "V" that is then followed by 5 to 7 other non-"V" characters. The "[...]" construct is a "character-class" and within such constructs a leading "^" causes negation. The "{...} consturct allows two digits specifying minimum and maximum lengths, and the "$" matches the length-0 end-of-string which I think was desired when you wrote "sixth, seventh or eighth last character":

sub("(V)(.{5,7})$", "\\2", str)
[1] "VDM 000 2.1.1"    "ABVC 001 10.15.0" "ASDV 123 1.20.0" 

Since you only wanted a single substitution I used sub instead of gsub.

Upvotes: 3

Related Questions