Creating summary table by dividing values by the average of the observations in the surrounding area in data.table

Question

The minimal sample of my data is;

library(data.table)

set.seed(1)

tbl <- data.table(store = c('A','A','A','A','A','A','A','B','B','B','B','B','B','C','C','C','C'),
                  year = c(2017,2017,2017,2017,2018,2018,2018,2017,2017,2017,2017,2017,2017,2017,2017,2017,2018),
                  week = c(12,13,15,16,2,3,4,18,19,20,22,24,25,1,2,3,2),
                  insert = sample(0:1,size = 17,replace = T),
                  demand = sample(200:250,size = 17,replace=T))

I'd like to create a summary table which is calculated in this way;

to calculate the effect where the insert column is 1, I must divide the demand value of the insert column by the average of the demand value of the nearest weeks within a year, which takes the value of the insert column zero around that week

For example, if I need to calculate for store A at 2017 and 3rd week, it must be:

rate <- 209 / mean(c(206,245))

but if the year has more than one inserts, like for store B at year 2017 it should be means of the rates (for week 19 and 20):

rate <- mean(224 / mean(c(240,236)), 245 / mean(c(240,236)))

and if I can't find two values around it, I have to calculate a ratio by dividing it by the only value I find. for example store C at year 2017:

rate <- 243 / 224

if I can find no value around the inserted row, I need to pass 1.

finally the summary table should look like;

desired_tbl <- data.table(store = c('A','A','B','C','C'),
                          year = c(2017,2018,2017,2017,2018),
                          rate = c(0.927,0.97,0.941,1.08,1))

desired_tbl

  store  year  rate
    
1 A      2017 0.927
2 A      2018 0.97 
3 B      2017 0.941
4 C      2017 1.08 
5 C      2018 1

I can do all of them by writing for loops and tons of conditions but I'm looking for a vectorised way to accomplish. dplyr solutions are also welcome.

Thanks in advance.

G. Grothendieck · Accepted Answer

Replace demand values having insert values of 1 with NA giving d and then fill in d forwards and backwards by group taking the mean of the two directions and dividing into demand giving rate. Then keep only the insert 1 rows and aggregate rate by store/year group using mean on what is left. If any rate is not finite use 1.

library(dplyr)
library(zoo)

tbl %>%
  mutate(d = ifelse(insert == 1, NA, demand)) %>%
  group_by(store, year) %>%
  mutate(rate = demand / 
    rowMeans(cbind(na.locf0(d), na.locf0(d, fromLast = TRUE)), na.rm = TRUE)) %>%
  filter(insert == 1) %>%
  summarize(rate = mean(rate, na.rm = TRUE), .groups = "drop") %>%
  mutate(rate = ifelse(is.finite(rate), rate, 1))

giving:

# A tibble: 5 x 3
  store  year  rate
    
1 A      2017 0.927
2 A      2018 0.973
3 B      2017 0.985
4 C      2017 1.08 
5 C      2018 1

Creating summary table by dividing values by the average of the observations in the surrounding area in data.table

Answers (2)

Related Questions