Elizabeth Robinson
Elizabeth Robinson

Reputation: 9

Replacing week/month times as strings with time in years? (R)

I found a pet adoption dataset that includes the age of a pet when adopted. However, the age variable contains strings like "3 months" or "4 years" or "3 weeks" all in the same column. The dataset is otherwise tidy. How can I convert these variables into year values?

I've tried something like this:

for(i in i:nrow(Pet_Train$AgeuponOutcome)){
 if(grepl(i, "month") == TRUE)
   Pet_Train$Age_in_Years[i] == "0"

}

But I have little experience with loops/if statements/this "grepl" function I just looked up. I do have experience with tidy functions like mutate() and filter() but I'm not sure how to apply those with these many of possible argument combinations.

Since there are 27,000 instances, so I'd rather not go through this by hand.

Edit: I figured out how to use the grepl function to replace instances containing "month" with "less than a year." But is there a way to take the exact number of months and convert them into the year as a decimal?

Upvotes: 0

Views: 101

Answers (4)

rg255
rg255

Reputation: 4169

Just to expand on the comment I left, you could use ifelse. First though, here's a reproducible example of your data (always very useful for you to provide this when asking a question):

df <- data.frame("Duration" = c("3 months", "4 years", "3 weeks"))

You can then split out the units and values from this using string split:

df$Value <- as.numeric(vapply(strsplit(as.character(df$Duration), split = " "), `[`, 1, FUN.VALUE=character(1)))
df$Units <- vapply(strsplit(as.character(df$Duration), split = " "), `[`, 2, FUN.VALUE=character(1))

Finally, use nested ifelse arguments which tell R what to do if data in a column matches a condition, and what to do if not - so I have this saying that, if the units is weeks, divide the amount by 52.18 (the number of weeks per year).

df$Years <- ifelse(df[,'Units']=="weeks", df[,'Value']/(365.25/7), ifelse(df[,'Units']=="months", df[,'Value']/12, df[,'Value']))

And the successful output:

> df
  Duration Value  Units      Years
1 3 months     3 months 0.25000000
2  4 years     4  years 4.00000000
3  3 weeks     3  weeks 0.05749487

Note: It would be more appropriate to do this with "days" as your unit of time, which could be done if you had dates for the first and second event (birth and adoption dates of the animal). This is because years and months are variable length units - December is longer than February, 2016 was longer than 2015 and 2017.

Upvotes: 0

Uwe
Uwe

Reputation: 42544

The answer of David Rubinger uses the lubridate package to coerce character strings to objects of class Duration.

The as.duration() function seems to recognize a variety of strings, e.g.,

age_text <- c("3 months", "4 years", "3 weeks", "52 weeks", "365 days 6 hours")
lubridate::as.duration(age_text)
[1] "7889400s (~13.04 weeks)" "126230400s (~4 years)"   "1814400s (~3 weeks)"    
[4] "31449600s (~52 weeks)"   "31557600s (~1 years)"

However, the OP has requested to convert the strings into year values rather than seconds.

This can be achieved by using the as.numeric() function which takes a units parameter to specify the desired conversion:

as.numeric(lubridate::as.duration(age_text), units = "years")
[1] 0.25000000 4.00000000 0.05749487 0.99657769 1.00000000

Other units can be chosen as well:

as.numeric(lubridate::as.duration(age_text), units = "months")
[1]  3.0000000 48.0000000  0.6899384 11.9589322 12.0000000
as.numeric(lubridate::as.duration(age_text), units = "weeks")
[1]  13.04464 208.71429   3.00000  52.00000  52.17857

Upvotes: 0

G. Grothendieck
G. Grothendieck

Reputation: 269644

The first two use only base of R and the third uses dplyr and tidyr.

1) Use read.table to split the input column into the numeric and units parts and then multiply the numeric part by the fraction of a year that the units part represents.

PT <- data.frame(Age = c("3 months", "4 years", "3 weeks")) # input

transform(cbind(PT, read.table(text = as.character(PT$Age))), 
  Years = V1 * (7 / 365.25 * (V2 == "weeks") + 1/12 * (V2 == "months") + (V2 == "years")))

giving:

       Age V1     V2      Years
1 3 months  3 months 0.25000000
2  4 years  4  years 4.00000000
3  3 weeks  3  weeks 0.05749487

2) Alternately the last line could be written in terms of switch:

transform(cbind(PT, read.table(text = as.character(PT$Age), as.is = TRUE)), 
  Years = V1 * sapply(V2, switch, weeks = 7 / 365.25, months = 1 / 12, years = 1))

3) This uses dplyr and tidyr:

PT %>%
   separate(Age, c("No", "Units")) %>%
   mutate(No = as.numeric(No), 
          Years = No * case_when(Units == "weeks" ~ 7 / 365.25,
                                 Units == "months" ~ 1 / 12,
                                 Units == "years" ~ 1))

giving:

  No  Units      Years
1  3 months 0.25000000
2  4  years 4.00000000
3  3  weeks 0.05749487

Upvotes: 3

David Rubinger
David Rubinger

Reputation: 3938

lubridate-based solution:

library(tidyverse)
library(lubridate)
dat <- data_frame(age_text = c("3 months", "4 years", "3 weeks"))
dat %>% mutate(age_in_years = duration(age_text) / dyears(1))

Upvotes: 2

Related Questions