pysy9987
pysy9987

Reputation: 97

Regex: extract a number after a string that contains a number

Suppose I have a string:

str <- "England has 90 cases(1 discharged, 5 died); Scotland has 5 cases(2 discharged, 1 died)"

How can I grab the number of discharged cases in England?

I have tried

sub("(?i).*England has [\\d] cases(.*?(\\d+).*", "\\1", str),

It's returning the original string. Many Thanks!

Upvotes: 1

Views: 67

Answers (3)

akrun
akrun

Reputation: 887981

We can use regmatches/gregexpr to match one or more digits (\\d+) followed by a space, 'discharged' to extract the number of discharges

as.integer(regmatches(str, gregexpr("\\d+(?= discharged)", str, perl = TRUE))[[1]])
#[1] 1 2

If it is specific only to 'England', start with the 'England' followed by characters tat are not a ( ([^(]+) and (, then capture the digits (\\d+) as a group, in the replacement specify the backreference (\\1) of the captured group

sub("England[^(]+\\((\\d+).*", "\\1", str)
#[1] "1"

Or if we go by the OP's option, the ( should be escaped as it is a metacharacter to capture group (after the cases). Also, \\d+ can be placed outside the square brackets

sub("(?i)England has \\d+ cases\\((\\d+).*", "\\1", str)
#[1] "1"

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389335

We can use str_match to capture number before "discharged".

stringr::str_match(str, "England.*?(\\d+) discharged")[, 2]
#[1] "1"

Upvotes: 1

2239559319
2239559319

Reputation: 124

the regex is \d+(?= discharged) and get the first match

Upvotes: 0

Related Questions