AKSHMKP
AKSHMKP

Reputation: 11

Performing in group operations in R

I have a data in which I have 2 fields in a table sf -> Customer id and Buy_date. Buy_date is unique but for each customer, but there can be more than 3 different values of Buy_dates for each customer. I want to calculate difference in consecutive Buy_date for each Customer and its mean value. How can I do this.

Example

Customer   Buy_date
1          2018/03/01
1          2018/03/19
1          2018/04/3
1          2018/05/10
2          2018/01/02
2          2018/02/10
2          2018/04/13

I want the results for each customer in the format

Customer  mean

Upvotes: 0

Views: 987

Answers (2)

Adam Warner
Adam Warner

Reputation: 1354

I am not exactly sure of the desired output but this what I think you want.

library(dplyr)
library(zoo)
dat <- read.table(text = 
"Customer   Buy_date
1          2018/03/01
1          2018/03/19
1          2018/04/3
1          2018/05/10
2          2018/01/02
2          2018/02/10
2          2018/04/13", header = T, stringsAsFactors = F)


dat$Buy_date <- as.Date(dat$Buy_date)

dat %>% group_by(Customer) %>% mutate(diff_between = as.vector(diff(zoo(Buy_date), na.pad=TRUE)), 
                                      mean_days = mean(diff_between, na.rm = TRUE))

This produces:

    Customer Buy_date   diff_between mean_days
     <int> <date>            <dbl>     <dbl>
1        1 2018-03-01           NA      23.3
2        1 2018-03-19           18      23.3
3        1 2018-04-03           15      23.3
4        1 2018-05-10           37      23.3
5        2 2018-01-02           NA      50.5
6        2 2018-02-10           39      50.5
7        2 2018-04-13           62      50.5

EDITED BASED ON USER COMMENTS:

Because you said that you have factors and not characters just convert them by doing the following:

dat$Buy_date <- as.Date(as.character(dat$Buy_date))
dat$Customer <- as.character(dat$Customer)

Upvotes: 0

symbolrush
symbolrush

Reputation: 7457

Here's a dplyr solution.

Your data:

df <- data.frame(Customer = c(1,1,1,1,2,2,2), Buy_date = c("2018/03/01", "2018/03/19", "2018/04/3", "2018/05/10", "2018/01/02", "2018/02/10", "2018/04/13"))

Grouping, mean Buy_date calculation and summarising:

library(dplyr)
df %>% group_by(Customer) %>% mutate(mean = mean(as.POSIXct(Buy_date))) %>% group_by(Customer, mean) %>% summarise()

Output:

# A tibble: 2 x 2
# Groups:   Customer [?]
  Customer mean               
     <dbl> <dttm>             
1        1 2018-03-31 06:30:00
2        2 2018-02-17 15:40:00

Or as @r2evans points out in his comment for the consecutive days between Buy_dates:

df %>% group_by(Customer) %>% mutate(mean = mean(diff(as.POSIXct(Buy_date)))) %>% group_by(Customer, mean) %>% summarise()

Output:

# A tibble: 2 x 2
# Groups:   Customer [?]
  Customer mean            
     <dbl> <time>          
1        1 23.3194444444444
2        2 50.4791666666667

Upvotes: 1

Related Questions