shivrajgondi
shivrajgondi

Reputation: 21

How to get a sum of numbers in a character string?

Character String is like this.

test <- c("John got a score of 4.5 in mathematics and scored 4.3 in English and ranked 4th.", "Matthew got a score of 7.6")

Output desired is c(8.8, 7.6).

Basically sum of numbers after "score" pattern.

I tried:

s <- as.numeric(gsub(pattern = "^\\D*score\\D*(\\d+\\.*\\d*)\\D*score*\\D*(\\d*\\.*\\d*)\\D*$", replacement = ("\\1"), test$Purpose)) + 
        as.numeric(gsub(pattern = "^\\D*score\\D*(\\d+\\.*\\d*)\\D*score*\\D*(\\d*\\.*\\d*)\\D*$", replacement = ("\\2"), test$Purpose))

However this is returning NAs.

Upvotes: 0

Views: 467

Answers (1)

akrun
akrun

Reputation: 887128

We can extract the numbers with regex and then do the sum

library(stringr)
sapply(str_extract_all(test, "(?<=score of )[0-9.]+|(?<=scored )[0-9.]+"),
                      function(x) sum(as.numeric(x)))
#[1] 8.8 7.6

Or using tidyverse

library(dplyr)
library(purrr)
str_extract_all(test, "\\b[0-9.]+\\b") %>%
      map_dbl(~ as.numeric(.x) %>%
                           sum)
#[1] 8.8 7.6

Or if we need to get only the numbers after score

str_extract_all(test, "(?<=score of )[0-9.]+|(?<=scored )[0-9.]+") %>%
     map_dbl(~ as.numeric(.x) %>%
                           sum)

Upvotes: 3

Related Questions