Reputation: 63
I have a dataframe where I have a customer number and the date they made a visit.
soTable <- data.frame(customer = c(1,1,1,1,1,2,2,2,3,3,4,4,4,4,5),
visit_date = c("12/4/2016","12/5/2016","12/6/2016","12/8/2016","12/22/2016",
"12/6/2016","12/9/2016","12/15/2016",
"12/4/2016","12/12/2016",
"12/4/2016","12/22/2016","12/23/2016","12/28/2016","12/5/2016"))
First I need to label the visit which I can do with a loop but I was wondering if there is some dplyr/data.table method which would be quicker. The result looking like the following:
customer visit_date visitNumber
1 1 12/4/2016 1
2 1 12/5/2016 2
3 1 12/6/2016 3
4 1 12/8/2016 4
5 1 12/22/2016 5
6 2 12/6/2016 1
7 2 12/9/2016 2
8 2 12/15/2016 3
9 3 12/4/2016 1
10 3 12/12/2016 2
11 4 12/4/2016 1
12 4 12/22/2016 2
13 4 12/23/2016 3
14 4 12/28/2016 4
15 5 12/5/2016 1
Then I would need to find the average time between visit number which would look like the following
visitNumber averageTimeBetween
1 1 2
2 2 4
3 3 5
4 4 7
5 5 8
Upvotes: 0
Views: 179
Reputation: 206263
Here's how to calculate the time between visit. First, make sure your visit date is a proper date format
soTable <- transform(soTable , visit_date = as.Date(visit_date, format="%m/%d/%Y"))
then you can use dplyr
library(dplyr)
soTable %>% group_by(customer) %>% arrange(customer, visit_date) %>%
mutate(visit_number=seq_along(visit_date),
time_since=visit_date-lag(visit_date)) %>%
group_by(visit_number) %>%
summarize(mean=mean(time_since))
For your sample data this returns
# A tibble: 5 × 2
visit_number mean
<int> <time>
1 1 NA days
2 2 7.500000 days
3 3 2.666667 days
4 4 3.500000 days
5 5 14.000000 days
Upvotes: 4