Reputation: 2663
I have this rather annoying to read regular expression.
pattern = "(?<=(?<=[0-9])[dD](?=[0-9]))[0-9]+"
It was generated automatically so human readability or efficiency is less of an issue than validity. It was meant to parse RPG dice type syntax, such as 10d20
. Specifically it is supposed to match the 20
.
If I use the old method of string matching in R
text = '10d20'
regmatches(text,regexpr(pattern,text,perl = TRUE))
I get what I want, which is 20, however using the more modern method of string matching
stringr::str_match(text, pattern)
I get nothing. I was wondering what causes this difference between the two methods and how can I avoid issues like this in the future.
Upvotes: 1
Views: 277
Reputation: 78792
Unless you need the extras that come with ICU (via stringi
which stringr
is merely a crutch helper wrapper for) there's no need for woe.
In fact, there's a pkg with less marketing power than tidyverse
-based pkgs called stringb
which puts "data first" (like string[ir]
) and relieves you from base regexp inanity. Vis-a-vis:
library(stringb)
pattern <- "(?<=(?<=[0-9])[dD](?=[0-9]))[0-9]+"
text <- '10d20'
text_extract(text, pattern, perl = TRUE)
## [1] "20"
You get saner syntax without relying on a massive compiled code dependencies and 1-away* stringr
abstraction. Bellisimo!
* TBFair: the stringb
package also has 1-away abstraction from base R functions but the saner syntax makes up for it IMO (unlike stringr
).
Upvotes: 1