novice_coder
novice_coder

Reputation: 181

Calculate the mean number of days between dates per group

My data are as follows:

year group date 
2019 A     2019-07-15
2019 A     2019-07-25
2019 A     2019-08-01
2019 B     2019-07-15
2019 B     2019-07-30
2020 A     2020-08-01
2020 A     2020-08-03
2020 B     2020-08-01
2020 B     2020-08-20
2020 B     2020-08-25

I would like to calculate the mean number of days between dates per year per group. I have tried the following code and receive the following error:

data_meandays <- data %>%
  group_by(year, group)%>% 
  mutate(Difference = date - lag(date)) %>%
  summarize(mean_time = mean(Difference, na.rm=TRUE))

Error in date - lag(date) : 
  non-numeric argument to binary operator

The class of my date column is Date.

Upvotes: 1

Views: 68

Answers (1)

akrun
akrun

Reputation: 887108

The error occurred because the date column is character and not Date class. We need to convert to Date class before doing the difference

library(dplyr)
data %>%
   mutate(date = as.Date(date)) %>% 
   group_by(year, group) %>% 
   mutate(Difference = date - lag(date)) %>% 
   summarize(mean_time = mean(Difference, na.rm=TRUE), .groups = 'drop')

-output

# A tibble: 4 × 3
   year group mean_time
  <int> <chr> <drtn>   
1  2019 A      8.5 days
2  2019 B     15.0 days
3  2020 A      2.0 days
4  2020 B     12.0 days

NOTE: the output from the difference between dates are difftime objects. If we want to convert to numeric class, it would be as.numeric applied on the column


The OP's error can be reproduced if we don't convert to Date class

data %>%  
  group_by(year, group)%>%  
  mutate(Difference = date - lag(date)) %>%  
  summarize(mean_time = mean(Difference, na.rm=TRUE))

Error in mutate(): ! Problem while computing Difference = date - lag(date). ℹ The error occurred in group 1: year = 2019, group = "A". Caused by error in date - lag(date): ! non-numeric argument to binary operator Run rlang::last_error() to see where the error occurred

data

data <- structure(list(year = c(2019L, 2019L, 2019L, 2019L, 2019L, 2020L, 
2020L, 2020L, 2020L, 2020L), group = c("A", "A", "A", "B", "B", 
"A", "A", "B", "B", "B"), date = c("2019-07-15", "2019-07-25", 
"2019-08-01", "2019-07-15", "2019-07-30", "2020-08-01", "2020-08-03", 
"2020-08-01", "2020-08-20", "2020-08-25")), 
class = "data.frame", row.names = c(NA, 
-10L))

Upvotes: 1

Related Questions