Ben
Ben

Reputation: 21625

How do I extract the first number that occurs after a matching pattern

Consider these examples:

examples <- c(
  "abc foo",
  "abc foo 17",
  "0 abc defg foo 5 121",
  "abc 12 foo defg 11"
)

Here I would like to return the first number that occurs after "foo". In this case: NA, 17, 5, 11. How can I do this? I tried using a look-behind, but with no luck.

library(stringr)
str_extract(examples, "(?<=foo.*)[0-9]+")

Error in stri_extract_first_regex(string, pattern, opts_regex = opts(pattern)) : 
  Look-Behind pattern matches must have a bounded maximum length. (U_REGEX_LOOK_BEHIND_LIMIT)

Upvotes: 3

Views: 414

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

You may use a base R solution like this:

> res <- gsub(".*?foo\\D*(\\d+).*|.*", "\\1", examples)
> res[nchar(res)==0] <- NA
> res
[1] NA   "17" "5"  "11"

As the regex will always match any string, you do not need to run a regex replacement twice, just fill out empty values with NA as the second step.

The pattern matches:

  • .*?foo - any 0+ chars as few as possible (since *? is lazy) up to the first occurrence of foo and then foo itself
  • \\D* - zero or more non-digit chars
  • (\\d+) - Group 1 that captures 1 or more digits (later, the value stored in the group can be referred with \1 backreference)
  • .* - the rest of the string
  • | - OR
  • .* - the whole string even if empty.

Upvotes: 1

Damian
Damian

Reputation: 1433

Base R gsub can do it:

# pulls fist instance of a digit 
gsub('^\\D*(\\d*).*', '\\1', examples)
[1] ""   "17" "0"  "12"

Edit: actual solution using base R

ifelse(
     grepl('foo\\D*\\d', examples), 
     gsub('^\\D*(\\d+).*', '\\1', gsub('.*foo\\s*', '', examples)), 
     NA)
[1] NA   "17" "5"  "11"

Upvotes: 0

Frank
Frank

Reputation: 66819

This seems to work:

str_match(examples, "foo.*?(\\d+)")

     [,1]          [,2]
[1,] NA            NA  
[2,] "foo 17"      "17"
[3,] "foo 5"       "5" 
[4,] "foo defg 11" "11"

From ?regex:

By default repetition is greedy, so the maximal possible number of repeats is used. This can be changed to ‘minimal’ by appending ? to the quantifier.

From ?str_extract:

See Also

?str_match to extract matched groups; ?stri_extract for the underlying implementation.

Upvotes: 5

Related Questions