Reputation:
Here's the current dataframe:
baking_time <- c("20 to 30 min", "20 to 30 min", "40 to 50 min", "10 to 30 min", "60 to 90 min", "40 to 50 min")
cake_type <- c("Chocolate", "Chocolate","Lemon","Tart","German","Lemon")
recipes <- data.frame(baking_time, cake_type)
Now I'm trying to parse baking_time to get this:
baking_time <- c(25, 25, 45, 20, 75, 45)
I've tried using parse but I'm having trouble parsing both numbers than performing operations on them
mutate(avg_time = (parse_number(baking_time) + parse_number(baking_time))/2)
Upvotes: 8
Views: 223
Reputation: 78792
stringi
(the un-crutched stringr
) and base R:
stringi::stri_match_first_regex(
recipes$baking_time,
"([[:digit:]]+)[[:space:]]+to[[:space:]]+([[:digit:]]+)",
)[,2:3] -> x
class(x) <- "numeric"
apply(x, 1, mean)
## [1] 25 25 45 20 75 45
Upvotes: 2
Reputation: 887088
We extract the numeric part of the column and get the mean
library(tidyverse)
recipes %>%
mutate(avg_time = str_extract_all(baking_time, "\\d+") %>%
map(., ~ mean(as.numeric(.x))))
# baking_time cake_type avg_time
#1 20 to 30 min Chocolate 25
#2 20 to 30 min Chocolate 25
#3 40 to 50 min Lemon 45
#4 10 to 30 min Tart 20
#5 60 to 90 min German 75
#6 40 to 50 min Lemon 45
NOTE: readr::parse_number
extracts the first numeric part. If there are multiple elements, needs to break it down and apply parse_number
recipes %>%
separate(baking_time, into = c("first", "second"),
sep=" to ", remove = FALSE) %>%
transmute(baking_time, avg_time = (parse_number(first) + parse_number(second))/2)
With base R
, one option is to read with read.csv
after changing the non-numeric part to a delimiter with gsub
, get the rowMeans
rowMeans(read.csv(text=gsub("\\D+", ",", recipes$baking_time), header = FALSE)[-3])
#[1] 25 25 45 20 75 45
Upvotes: 7
Reputation: 37641
You can do this in base R using gregexpr
and regmatches
to get the times.
Times = regmatches(baking_time, gregexpr("\\d+", baking_time))
sapply(Times, function(x) mean(as.numeric(x)))
[1] 25 25 45 20 75 45
Upvotes: 4