Reputation: 901
I have the following string
string <- c('a - b - c - d',
'z - c - b',
'y',
'u - z')
I would like to subset it such that everything after the second occurrence of ' - ' is thrown away.
The result would be this:
> string
[1] "a - b" "z - c" "y" "u - z"
I used substr(x = string, 1, regexpr(string, pattern = '[^ - ]*$') - 4)
, but it excludes the last occurrence of ' - ', which is not what I want .
Upvotes: 6
Views: 4830
Reputation: 170
try this (\w(?:\s+-\s+\w)?).*
. For the explanation of the regex look this https://regex101.com/r/BbfsNQ/2.
That regex will retrieve the first tuple if exists or just the first caracter if there's not a tuple. So, the data is get into a "capturing group". Then to display the captured groups, it depends on the used language but in pure regex that will be \1
to get the first group (\2
to get second etc...). Look at the part "Substitution" on the regex101 if you wan't a graphic example.
Upvotes: 0
Reputation: 626845
Note that you cannot use a negated character class to negate a sequence of characters. [^ - ]*$
matches any 0+ chars other than a space (yes, it matches -
, too, because the -
created a range between a space and a space) followed by the end of the string marker ($
).
You may use a sub
function with the following regex:
^(.*? - .*?) - .*
to replace with \1
. See the regex demo.
R code:
> string <- c('a - b - c - d', 'z - c - b', 'y', 'u - z')
> sub("^(.*? - .*?) - .*", "\\1", string)
[1] "a - b" "z - c" "y" "u - z"
Details:
^
- start of a string(.*? - .*?)
- Group 1 (referred to with the \1
backreference in the replacement pattern) capturing any 0+ chars lazily up to the first space, hyphen, space and then again any 0+ chars up to the next leftmost occurrence of space, hyphen, space -
- a space, hyphen and a space.*
- any zero or more chars up to the end of the string.Upvotes: 6