Reputation: 79
another day with new complex faced
Below are the columns and rows that I have as input:
ID Age
123 23 Years 1 Month 2 Days
125 28 Years 9 Month 14 Days
126 28 years
127 34 YEAR
128 35 Years 8 Month 21 Days
129 38 Years 5 Month 25 Days
130 32.8
I need them as yearly calculated in new columns like:
ID Age Age_new
123 23 Years 1 Month 2 Days 23.1
125 28 Years 9 Month 14 Days 28.9
126 28 years 28
127 34 YEAR 34
128 35 Years 8 Month 21 Days 35.8
129 38 Years 5 Month 25 Days 38.5
130 32.8 32.8
I have tried the by stringr
package but I get only first character string
which doesn't provide like the above.
Upvotes: 0
Views: 667
Reputation: 11
This is my approach. I always try to avoid regex since it's too scary for me. If your data is exactly separated like your example, I think my code will work. I completely understand this is not the most efficient way. but heyy it works
dat %>%
mutate(space_counter = stringr::str_count(Age," ")) %>%
tidyr::separate(Age,into = paste0("tmp_col_",1:(max(.$space_counter)+1)),sep = " ") %>%
select(ID, tmp_col_1,tmp_col_3,tmp_col_5) %>%
setNames(c("ID","year","month","day")) %>%
mutate(across(everything(), ~replace_na(.x, 0))) %>%
mutate_if(is.character,as.integer) %>%
mutate(asdur = as.duration(years(year) + months(month) + days(day))) %>%
mutate(age_new = as.numeric(asdur)/3.154e+7)
Upvotes: 0
Reputation: 160437
Here's a gross approximation:
func <- function(x, ptn) {
out <- gsub(paste0(".*?\\b([0-9.]+)\\s*", ptn, ".*"), "\\1", x, ignore.case = TRUE)
ifelse(out == x, NA, out)
}
library(dplyr)
dat %>%
mutate(
data.frame(
lapply(c(yr = "year", mon = "month", day = "day"),
function(ptn) as.numeric(func(Age, ptn)))
),
yr = if_else(is.na(yr), suppressWarnings(as.numeric(Age)), yr),
across(c(yr, mon, day), ~ coalesce(., 0)), New_Age = yr + mon/12 + day/365
)
# ID Age yr mon day New_Age
# 1 123 23 Years 1 Month 2 Days 23.0 1 2 23.08881
# 2 125 28 Years 9 Month 14 Days 28.0 9 14 28.78836
# 3 126 28 years 28.0 0 0 28.00000
# 4 127 34 YEAR 34.0 0 0 34.00000
# 5 128 35 Years 8 Month 21 Days 35.0 8 21 35.72420
# 6 129 38 Years 5 Month 25 Days 38.0 5 25 38.48516
# 7 130 32.8 32.8 0 0 32.80000
(I offer no warranty on true accuracy.)
Data
dat <- structure(list(ID = c(123L, 125L, 126L, 127L, 128L, 129L, 130L), Age = c("23 Years 1 Month 2 Days", "28 Years 9 Month 14 Days", "28 years", "34 YEAR", "35 Years 8 Month 21 Days", "38 Years 5 Month 25 Days", "32.8")), class = "data.frame", row.names = c(NA, -7L))
Upvotes: 1