Reputation: 21625
Consider these examples:
examples <- c(
"abc foo",
"abc foo 17",
"0 abc defg foo 5 121",
"abc 12 foo defg 11"
)
Here I would like to return the first number that occurs after "foo". In this case: NA, 17, 5, 11. How can I do this? I tried using a look-behind, but with no luck.
library(stringr)
str_extract(examples, "(?<=foo.*)[0-9]+")
Error in stri_extract_first_regex(string, pattern, opts_regex = opts(pattern)) :
Look-Behind pattern matches must have a bounded maximum length. (U_REGEX_LOOK_BEHIND_LIMIT)
Upvotes: 3
Views: 414
Reputation: 626738
You may use a base R solution like this:
> res <- gsub(".*?foo\\D*(\\d+).*|.*", "\\1", examples)
> res[nchar(res)==0] <- NA
> res
[1] NA "17" "5" "11"
As the regex will always match any string, you do not need to run a regex replacement twice, just fill out empty values with NA as the second step.
The pattern matches:
.*?foo
- any 0+ chars as few as possible (since *?
is lazy) up to the first occurrence of foo
and then foo
itself\\D*
- zero or more non-digit chars(\\d+)
- Group 1 that captures 1 or more digits (later, the value stored in the group can be referred with \1
backreference).*
- the rest of the string|
- OR.*
- the whole string even if empty.Upvotes: 1
Reputation: 1433
Base R gsub
can do it:
# pulls fist instance of a digit
gsub('^\\D*(\\d*).*', '\\1', examples)
[1] "" "17" "0" "12"
Edit: actual solution using base R
ifelse(
grepl('foo\\D*\\d', examples),
gsub('^\\D*(\\d+).*', '\\1', gsub('.*foo\\s*', '', examples)),
NA)
[1] NA "17" "5" "11"
Upvotes: 0
Reputation: 66819
This seems to work:
str_match(examples, "foo.*?(\\d+)")
[,1] [,2]
[1,] NA NA
[2,] "foo 17" "17"
[3,] "foo 5" "5"
[4,] "foo defg 11" "11"
From ?regex
:
By default repetition is greedy, so the maximal possible number of repeats is used. This can be changed to ‘minimal’ by appending
?
to the quantifier.
From ?str_extract
:
See Also
?str_match
to extract matched groups;?stri_extract
for the underlying implementation.
Upvotes: 5