Regex to till the first occurrence of the bracket close

Question

I have a string named cars which is as follows:

cars
[1] "Only one car(52;model-14557) had a good engine(workable condition), others engine were damaged beyond repair"   
[2] "Other car(21, model-155) looked in good condition but car ( 36, model-8878) looked to be in terrible condition."

I need to extract the following parts from the string:

car(52;model-14557)
car(21, model-155)
car ( 36, model-8878)

I tried using the following piece of could to extract it:

stringr::str_extract_all(cars, "(.car\s{0,5}$([^]]+)$)")

This gave me the following output:

[[1]]
[1] " car(52;model-14557) had a good engine(workable condition)"

[[2]]
[1] " car(21, model-155) looked in good condition but car ( 36, model-8878)"

Is there a way in which I could extract the word cars with the associated number and model number?

Wiktor Stribiżew · Accepted Answer

Your regex does not work because you are using [^]]+, one or more symbols other than ] that matches ( and ), and thus matches from the first ( up to the last ) with no ] in between.

Use

> cars <- c("Only one car(52;model-14557) had a good engine(workable condition), others engine were damaged beyond repair","Other car(21, model-155) looked in good condition but car ( 36, model-8878) looked to be in terrible condition.")
> library(stringr)
> str_extract_all(cars, "\bcar\s*$[^()]+$")
[[1]]
[1] "car(52;model-14557)"

[[2]]
[1] "car(21, model-155)"    "car ( 36, model-8878)"

The regex is \bcar\s*$[^()]+$, see the online regex demo here.

It matches:

\b - a word boundary
car - the literal char sequence
\s* - 0+ whitespaces
$ - a literal (
[^()]+ - 1 or more chars other than ( and )
$ - a literal ).

Note the same regex will yield the same results with the following base R code:

> regmatches(cars, gregexpr("\bcar\s*$[^()]+$", cars))
[[1]]
[1] "car(52;model-14557)"

[[2]]
[1] "car(21, model-155)"    "car ( 36, model-8878)"

Regex to till the first occurrence of the bracket close

Answers (1)

Related Questions