Reputation: 21
Character String is like this.
test <- c("John got a score of 4.5 in mathematics and scored 4.3 in English and ranked 4th.", "Matthew got a score of 7.6")
Output desired is c(8.8, 7.6).
Basically sum of numbers after "score" pattern.
I tried:
s <- as.numeric(gsub(pattern = "^\\D*score\\D*(\\d+\\.*\\d*)\\D*score*\\D*(\\d*\\.*\\d*)\\D*$", replacement = ("\\1"), test$Purpose)) +
as.numeric(gsub(pattern = "^\\D*score\\D*(\\d+\\.*\\d*)\\D*score*\\D*(\\d*\\.*\\d*)\\D*$", replacement = ("\\2"), test$Purpose))
However this is returning NAs.
Upvotes: 0
Views: 467
Reputation: 887128
We can extract the numbers with regex and then do the sum
library(stringr)
sapply(str_extract_all(test, "(?<=score of )[0-9.]+|(?<=scored )[0-9.]+"),
function(x) sum(as.numeric(x)))
#[1] 8.8 7.6
Or using tidyverse
library(dplyr)
library(purrr)
str_extract_all(test, "\\b[0-9.]+\\b") %>%
map_dbl(~ as.numeric(.x) %>%
sum)
#[1] 8.8 7.6
Or if we need to get only the numbers after score
str_extract_all(test, "(?<=score of )[0-9.]+|(?<=scored )[0-9.]+") %>%
map_dbl(~ as.numeric(.x) %>%
sum)
Upvotes: 3