Reputation: 97
Suppose I have a string:
str <- "England has 90 cases(1 discharged, 5 died); Scotland has 5 cases(2 discharged, 1 died)"
How can I grab the number of discharged cases in England?
I have tried
sub("(?i).*England has [\\d] cases(.*?(\\d+).*", "\\1", str),
It's returning the original string. Many Thanks!
Upvotes: 1
Views: 67
Reputation: 887981
We can use regmatches/gregexpr
to match one or more digits (\\d+
) followed by a space, 'discharged' to extract the number of discharges
as.integer(regmatches(str, gregexpr("\\d+(?= discharged)", str, perl = TRUE))[[1]])
#[1] 1 2
If it is specific only to 'England', start with the 'England' followed by characters tat are not a (
([^(]+
) and (
, then capture the digits (\\d+
) as a group, in the replacement specify the backreference (\\1
) of the captured group
sub("England[^(]+\\((\\d+).*", "\\1", str)
#[1] "1"
Or if we go by the OP's option, the (
should be escaped as it is a metacharacter to capture group (after the cases
). Also, \\d+
can be placed outside the square brackets
sub("(?i)England has \\d+ cases\\((\\d+).*", "\\1", str)
#[1] "1"
Upvotes: 1
Reputation: 389335
We can use str_match
to capture number before "discharged"
.
stringr::str_match(str, "England.*?(\\d+) discharged")[, 2]
#[1] "1"
Upvotes: 1