Reputation: 7704
I have a string named cars
which is as follows:
cars
[1] "Only one car(52;model-14557) had a good engine(workable condition), others engine were damaged beyond repair"
[2] "Other car(21, model-155) looked in good condition but car ( 36, model-8878) looked to be in terrible condition."
I need to extract the following parts from the string:
car(52;model-14557)
car(21, model-155)
car ( 36, model-8878)
I tried using the following piece of could to extract it:
stringr::str_extract_all(cars, "(.car\\s{0,5}\\(([^]]+)\\))")
This gave me the following output:
[[1]]
[1] " car(52;model-14557) had a good engine(workable condition)"
[[2]]
[1] " car(21, model-155) looked in good condition but car ( 36, model-8878)"
Is there a way in which I could extract the word cars with the associated number and model number?
Upvotes: 2
Views: 332
Reputation: 627082
Your regex does not work because you are using [^]]+
, one or more symbols other than ]
that matches (
and )
, and thus matches from the first (
up to the last )
with no ]
in between.
Use
> cars <- c("Only one car(52;model-14557) had a good engine(workable condition), others engine were damaged beyond repair","Other car(21, model-155) looked in good condition but car ( 36, model-8878) looked to be in terrible condition.")
> library(stringr)
> str_extract_all(cars, "\\bcar\\s*\\([^()]+\\)")
[[1]]
[1] "car(52;model-14557)"
[[2]]
[1] "car(21, model-155)" "car ( 36, model-8878)"
The regex is \bcar\s*\([^()]+\)
, see the online regex demo here.
It matches:
\b
- a word boundarycar
- the literal char sequence\s*
- 0+ whitespaces\(
- a literal (
[^()]+
- 1 or more chars other than (
and )
\)
- a literal )
.Note the same regex will yield the same results with the following base R code:
> regmatches(cars, gregexpr("\\bcar\\s*\\([^()]+\\)", cars))
[[1]]
[1] "car(52;model-14557)"
[[2]]
[1] "car(21, model-155)" "car ( 36, model-8878)"
Upvotes: 3