Daniel
Daniel

Reputation: 97

How to calculate with different columns of dates

I have a df called t_compl with three columns of dates.

date_op_indl    date_compl     usdato.x
1984-11-22  1984-11-22     1983-09-07
2004-11-16  NA             1994-10-27
1996-09-10  1996-09-10     1982-11-09
1986-05-24  1986-05-24     1982-11-22
1989-08-22  1989-06-13     1983-02-11

I want to create a fourth variable (t_compl$time) with the time difference between usdato.x and date_compl or date_op_indl in years. I want to choose the earliest date from date_compl and date_op_indl.

I have tried

t_compl$time<- ifelse((t_compl$date_compl-t_compl$usdato.x)<=(t_compl$date_op_indl-t_compl$usdato.x), ((t_compl$date_compl-t_compl$usdato.x)/365.25), ((t_compl$date_op_indl-t_compl$usdato.x)/365.25))

Because some date_compl are missing I want to use compl_op_indl for the calculation

t_compl$time[is.na(t_compl$time)] <- ((t_compl$date_op_indl-t_compl$usdato.x)/365.25)

and then get this error

Warning message:
In t_compl$time[is.na(t_compl$time)] <- ((t_compl$date_op_indl -  :
  number of items to replace is not a multiple of replacement length

The time calculations also turn out completely wrong for the compl_date that were NA.

How to make this time difference calculation in R?

Upvotes: 1

Views: 74

Answers (2)

jazzurro
jazzurro

Reputation: 23574

Here is another approach. I converted character to date then calculated the time difference. Since you said you would use year as unit, I have /365 in mutate.

library(dplyr)

mydf %>%
    mutate_each(funs(as.Date(.))) %>%
    mutate(time = ifelse(date_compl %in% NA, (date_op_indl - usdato.x) / 365,
                     (date_compl - usdato.x) / 365))

#  date_op_indl date_compl   usdato.x      time
#1   1984-11-22 1984-11-22 1983-09-07  1.210959
#2   2004-11-16       <NA> 1994-10-27 10.063014
#3   1996-09-10 1996-09-10 1982-11-09 13.846575
#4   1986-05-24 1986-05-24 1982-11-22  3.504110
#5   1989-08-22 1989-06-13 1983-02-11  6.339726

Upvotes: 1

Roland
Roland

Reputation: 132969

DF <- read.table(text="date_op_indl    date_compl     usdato.x
1984-11-22  1984-11-22     1983-09-07
2004-11-16  NA             1994-10-27
1996-09-10  1996-09-10     1982-11-09
1986-05-24  1986-05-24     1982-11-22
1989-08-22  1989-06-13     1983-02-11", header=TRUE)

DF[] <- lapply(DF, as.Date)

Use pmin to calculate the minimum date for each observation (thereby ignoring NA values):

DF$time <- difftime(do.call(pmin, c(DF[, c("date_op_indl", "date_compl")], na.rm = TRUE)),
                    DF$usdato.x, units="days")
#   date_op_indl date_compl   usdato.x      time
# 1   1984-11-22 1984-11-22 1983-09-07  442 days
# 2   2004-11-16       <NA> 1994-10-27 3673 days
# 3   1996-09-10 1996-09-10 1982-11-09 5054 days
# 4   1986-05-24 1986-05-24 1982-11-22 1279 days
# 5   1989-08-22 1989-06-13 1983-02-11 2314 days

Upvotes: 1

Related Questions