user82383
user82383

Reputation: 899

How to calculate the difference between dates in R for each unique id

I am new to R and have the following data of user name and their usage date for a product (truncated output):

Name,  Date 
Jane,  01-24-2016 10:02:00
Mary,  01-01-2016 12:18:00
Mary,  01-01-2016 13:18:00
Mary,  01-02-2016 13:18:00
Jane,  01-23-2016 10:02:00

I would like to do some analysis on difference between Date, in particular the number of days between usage for each user. I'd like to plot a histogram to determine if there is a pattern among users.

  1. how do I compute the difference between dates for each user in R ?
  2. are there any other visualizations besides histograms I should explore ?

Thanks

Upvotes: 2

Views: 1366

Answers (2)

Zheyuan Li
Zheyuan Li

Reputation: 73265

Try this, assuming your data frame is df:

## in case you have different column names
colnames(df) <- c("Name", "Date")

## you might also have Date as factors when reading in data
## the following ensures it is character string
df$Date <- as.character(df$Date)

## convert to Date object
## see ?strptime for various available format
## see ?as.Date for Date object
df$Date <- as.Date(df$Date, format = "%m-%d-%Y %H:%M:%S")

## reorder, so that date are ascending (see Jane)
## this is necessary, otherwise negative number occur after differencing
## see ?order on ordering
df <- df[order(df$Name, df$Date), ]

## take day lags per person
## see ?diff for taking difference
## see ?tapply for applying FUN on grouped data
## as.integer() makes output clean
## if unsure, compare with: lags <- with(df, tapply(Date, Name, FUN = diff))
lags <- with(df, tapply(Date, Name, FUN = function (x) as.integer(diff(x))))

For you truncated data (with 5 rows), I get:

> lags
$Jane
[1] 1

$Mary
[1] 0 1

lags is a list. If you want to get Jane's information, do lags$Jane. To get a histogram, do hist(lags$Jane). Furthermore, if you want to simply produce a histogram for all clients, overlooking individual difference, use hist(unlist(lags)). The unlist() collapse a list into a single vector.


comments:

  1. regarding your requirement for good reference to R, see CRAN: R intro and advanced R;
  2. using tapply for multiple indices? Maybe you can try the trick I gave by using paste to first construct an auxiliary index;
  3. Er, looks like I quickly made things complicated than necessary, by using density and central limit theorem, etc, for visualization. So I removed my other answer.

Upvotes: 2

akrun
akrun

Reputation: 886948

We can use data.table with lubridate

library(lubridate)
library(data.table)
setDT(df1)[order(mdy_hms(Date)),  .(Diff=as.integer(diff(as.Date(mdy_hms(Date))))), Name]
#    Name Diff
#1: Mary    0
#2: Mary    1
#3: Jane    1

If there are several grouping variables i.e. "ID" , we can place it in the by

setDT(df1)[order(mdy_hms(Date)),  .(Diff=as.integer(diff(as.Date(mdy_hms(Date))))), 
                                        by = .(Name, ID)]

Upvotes: 2

Related Questions