Reputation: 1495
How would I extract a specific character, using stringr, based on a specific pattern.
For example, if I have the following coefficient in a tidy model table:
I(pmax(0, hp - 100))
I want to create two additional columns with hp and 100.
Example code:
library(tidyverse)
library(broom)
library(stringr)
#pull in and gather data
mtcars1 <- as_tibble(mtcars)
mtcars1$cyl <- as.factor(mtcars$cyl)
#run model and produce model-summary table
model <- glm(mpg ~ cyl + hp + I(pmax(0, hp - 100)), data = mtcars1)
model_summary <- tidy(model)
How would I extract a specific character, using stringr, based on a specific pattern.
For example, if I have the following coefficient in a tidy model table:
I(pmax(0, hp - 100))
I want to create two additional columns with hp and 100.
I've tried the following that works (specific regex statement) on regex101.com, but not in r.
model_summary_hp <- model_summary %>%
mutate(term1 = str_extract(term, regex("\I\(pmax\(0, ([a-z]+)\ - 100\)\)")),
knot = str_extract(term, regex("\I\(pmax\(0, [a-z]+ - ([0-9]+)\)\)")))
I get the following error:
Error: '\I' is an unrecognized escape in character string starting ""\I"
I'm not sure why it doesn't recognize the regex statement.
Upvotes: 2
Views: 381
Reputation: 626825
One very important thing is to understand how to use a regex online tester: if you see something there, it does not mean it will work the same in your target environment. Since you are using stringr
functions, you must make sure your patterns are ICU engine compatible while regex101 only supports PCRE, JS, Python re
and Go regex engines. Mind that if you use (g)sub
you must make sure the regex is compatible with the TRE regex engine or PCRE (when adding perl=TRUE
).
Now, you need to extract 2 values, and that means you need to use 2 str_extract
or sub
calls.
A stringr
approach:
1) "(?<=I\\(pmax\\(0, )[a-z]+" # or
"(?<=I\\(pmax\\(0,\\s{0,10})[a-z]+"
2) "\\d+(?=\\)\\))"
Here, the main points are lookarounds: (?<=I\\(pmax\\(0, )
matches I(pmax(0,
immediately to the left of the current location, but does not put the matched text into the match value. The (?=\\)\\))
pattern is a positive lookahead that requires the presence of ))
immediately to the right of the current location.
Note that the second version of the first regex will not work at regex101.com since the lookbehind pattern is constrained-width here, not fixed-width.
A sub
approach (TRE regex):
1) sub("I\\(pmax\\(\\d+,\\s*([a-z]+)\\s*-\\s*\\d+\\)\\)","\\1", term)
2) sub("I\\(pmax\\(\\d+,\\s*[a-z]+\\s*-\\s*(\\d+)\\)\\)","\\1", term)
Here, the point is to match the whole string, capture what you need, and replace with the placeholder to this group, \1
.
Upvotes: 1