D Pinto
D Pinto

Reputation: 901

Exclude everything after the second occurrence of a certain string

I have the following string

string <- c('a - b - c - d',
            'z - c - b',
            'y',
            'u - z')

I would like to subset it such that everything after the second occurrence of ' - ' is thrown away.

The result would be this:

> string
[1]  "a - b" "z - c" "y"     "u - z"

I used substr(x = string, 1, regexpr(string, pattern = '[^ - ]*$') - 4), but it excludes the last occurrence of ' - ', which is not what I want .

Upvotes: 6

Views: 4830

Answers (2)

spasfonx
spasfonx

Reputation: 170

try this (\w(?:\s+-\s+\w)?).*. For the explanation of the regex look this https://regex101.com/r/BbfsNQ/2.

That regex will retrieve the first tuple if exists or just the first caracter if there's not a tuple. So, the data is get into a "capturing group". Then to display the captured groups, it depends on the used language but in pure regex that will be \1 to get the first group (\2 to get second etc...). Look at the part "Substitution" on the regex101 if you wan't a graphic example.

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626845

Note that you cannot use a negated character class to negate a sequence of characters. [^ - ]*$ matches any 0+ chars other than a space (yes, it matches -, too, because the - created a range between a space and a space) followed by the end of the string marker ($).

You may use a sub function with the following regex:

^(.*? - .*?) - .*

to replace with \1. See the regex demo.

R code:

> string <- c('a - b - c - d', 'z - c - b', 'y', 'u - z')
> sub("^(.*? - .*?) - .*", "\\1", string)
[1] "a - b" "z - c" "y"     "u - z"

Details:

  • ^ - start of a string
  • (.*? - .*?) - Group 1 (referred to with the \1 backreference in the replacement pattern) capturing any 0+ chars lazily up to the first space, hyphen, space and then again any 0+ chars up to the next leftmost occurrence of space, hyphen, space
  • - - a space, hyphen and a space
  • .* - any zero or more chars up to the end of the string.

Upvotes: 6

Related Questions