matsuo_basho
matsuo_basho

Reputation: 3010

Using summarise_each with custom function that depends on month of data being summarized

I have a table with 150+ variables for each day spanning 5 years. I would like to create a daily average summary for each variable for each year-month. However, if the month is Jan, May, July, September, November or December, I would like to divide the sum of all values by the count - 1.

dplyr's summarise_each works well for what I want to do. However, I'm not having success with integrating a custom function into the funs argument:

by.ym <- training %>% filter(Day.W!=1) %>% group_by(training, year=year(Date), month=month(Date))

testb <- summarise_each(by.ym[,-c(1:3)], 
                        funs(. / (if (month %in% c(1, 5, 7, 9, 11, 12)) {
                          sum(.)/(nrow(.)-1)
                        } else mean(.))
                        ))

The error message is:

Error: expecting a single value
In addition: Warning messages:
1: In if (c(10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,  :
  the condition has length > 1 and only the first element will be used
2: In if (c(10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,  :
  the condition has length > 1 and only the first element will be used

Upvotes: 0

Views: 860

Answers (1)

bramtayl
bramtayl

Reputation: 4024

Putting comments suggestions together, and using iris as test data:

library(dplyr)
library(tidyr)

multipliers = data_frame(
  month = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12),
  bevel = c(1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1)
)

iris %>%
  select(-Species) %>%
  mutate(month = 1:12 %>% rep(length.out = n()) ) %>%
  gather(variable, value, -month) %>%
  left_join(multipliers) %>%
  group_by(month, variable) %>%
  summarize(value = sum(value) / (n() - first(bevel))) %>%
  spread(variable, value)

Upvotes: 1

Related Questions