erty234
erty234

Reputation: 11

Calculating Average Time between Dates

I'm having difficulty calculating the average time between the payment dates for my csv. I have tried multiple methods that I have seen online (changing to data.table, using ddply) with no success

WorkerID  PaymentDate
 1       2015-07-18
 1       2015-08-18
 3       2015-09-18
 4       2015-10-18
 4       2015-11-18

This is an example of my dataset- I wanted to calculate the average time between the PaymentDates (in number of days), in the simplest way possible. I would like to group by the workerID. Thank you!

Upvotes: 1

Views: 64

Answers (2)

Oliver
Oliver

Reputation: 8602

An alternative to AkselA's answer, one can use the data.table package if one prefers this over base R.

This is similar to using aggregate, but may sometimes give a speed boost. In my example below I've handled single times by setting the difference to 0, to illustrate how this can be achieved.

library(lubridate)
library(data.table)
df <- fread("WorkerID  PaymentDate
 1       2015-07-18
 1       2015-08-18
 3       2015-09-18
 4       2015-10-18
 4       2015-11-18")
df[,PaymentDate := as.Date(PaymentDate)]
df[,{
    if(length(PaymentDate) > 1){
        mean(diff(as.numeric(PaymentDate)))
    }else
        0
}, by = WorkerID]

Upvotes: 0

AkselA
AkselA

Reputation: 8846

This is a perfect job for aggregate(). It groups PaymentDate by WorkerID and applies the function mean(diff(.)) to each group.

tt <- read.table(text="
WorkerID  PaymentDate
 1       2015-06-18
 1       2015-07-18
 1       2015-08-18
 2       2015-09-18
 3       2015-08-18
 3       2015-09-18
 4       2015-10-18
 4       2015-11-18
 4       2015-12-16", header=TRUE)

tt$PaymentDate <- as.Date(tt$PaymentDate)

aggregate(PaymentDate ~ WorkerID, data=tt, FUN=function(x) mean(diff(x)))
#   WorkerID PaymentDate
# 1        1       30.5 
# 2        2        NaN 
# 3        3       31.0 
# 4        4       29.5 

Upvotes: 0

Related Questions