R regex find ranges in strings

Question

I have a bunch of email subject lines and I'm trying to extract whether a range of values are present. This is how I'm trying to do it but am not getting the results I'd like:

library(stringi)

df1 <- data.frame(id = 1:5, string1 = NA)
df1$string1 <- c('15% off','25% off','35% off','45% off','55% off')

df1$pctOff10_20 <- stri_match_all_regex(df1$string1, '[10-20]%')


  id string1 pctOff10_20
1  1 15% off          NA
2  2 25% off          NA
3  3 35% off          NA
4  4 45% off          NA
5  5 55% off          NA

I'd like something like this:

 id string1 pctOff10_20
1  1 15% off          1
2  2 25% off          0
3  3 35% off          0
4  4 45% off          0
5  5 55% off          0

Cath · Accepted Answer

Here is the way to go,

df1$pctOff10_20 <- stri_count_regex(df1$string1, '^(1\d|20)%')

Explanation:

^                        the beginning of the string
(                        group and capture to \1:
  1                        '1'
  \d                       digits (0-9)
 |                        OR
  20                       '20'
)                        end of \1
%                        '%'

R regex find ranges in strings

Answers (2)

Related Questions