daveDo
daveDo

Reputation: 63

Labeling Time Between Visits

I have a dataframe where I have a customer number and the date they made a visit.

soTable <- data.frame(customer = c(1,1,1,1,1,2,2,2,3,3,4,4,4,4,5),
                      visit_date = c("12/4/2016","12/5/2016","12/6/2016","12/8/2016","12/22/2016",
                               "12/6/2016","12/9/2016","12/15/2016",
                               "12/4/2016","12/12/2016",
                               "12/4/2016","12/22/2016","12/23/2016","12/28/2016","12/5/2016"))

First I need to label the visit which I can do with a loop but I was wondering if there is some dplyr/data.table method which would be quicker. The result looking like the following:

   customer visit_date visitNumber
1         1  12/4/2016           1
2         1  12/5/2016           2
3         1  12/6/2016           3
4         1  12/8/2016           4
5         1 12/22/2016           5
6         2  12/6/2016           1
7         2  12/9/2016           2
8         2 12/15/2016           3
9         3  12/4/2016           1
10        3 12/12/2016           2
11        4  12/4/2016           1
12        4 12/22/2016           2
13        4 12/23/2016           3
14        4 12/28/2016           4
15        5  12/5/2016           1

Then I would need to find the average time between visit number which would look like the following

  visitNumber averageTimeBetween
1           1                 2
2           2                 4
3           3                 5
4           4                 7
5           5                 8

Upvotes: 0

Views: 179

Answers (1)

MrFlick
MrFlick

Reputation: 206263

Here's how to calculate the time between visit. First, make sure your visit date is a proper date format

soTable <- transform(soTable , visit_date = as.Date(visit_date, format="%m/%d/%Y"))

then you can use dplyr

library(dplyr)

soTable %>% group_by(customer) %>% arrange(customer, visit_date) %>% 
  mutate(visit_number=seq_along(visit_date), 
      time_since=visit_date-lag(visit_date)) %>% 
group_by(visit_number) %>% 
summarize(mean=mean(time_since))

For your sample data this returns

# A tibble: 5 × 2
  visit_number           mean
         <int>         <time>
1            1        NA days
2            2  7.500000 days
3            3  2.666667 days
4            4  3.500000 days
5            5 14.000000 days

Upvotes: 4

Related Questions