RayVelcoro
RayVelcoro

Reputation: 534

Summing in a list in dplyr

I have a tibble which has the date, day of the quarter, and a list of whether the days in quarter are sales days or not (1/0). I would like to sum the list of sales days (to get # of sales days left in the quarter) by summing from day_num to the end of sales_day_list So for the first day of Q2, 1993, I'd like to sum from 1:end of sales_day_list to get 64. On day 2, I'd like to sum from 2:end of sales_day_list etc.

Here is an example of my data (the output from dput)

my_data= structure(list(DW_DATE_ID = structure(c(733622400, 733708800, 
733795200, 733881600, 733968000, 734054400), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), day_num = 1:6, sales_day_list = list(
    c(1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 
    1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 
    1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 
    0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 
    1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0), c(1, 0, 0, 
    1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 
    0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 
    1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 
    1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 
    0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0), c(1, 0, 0, 1, 1, 1, 
    1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 
    1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 
    0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 
    1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 
    1, 1, 1, 0, 0, 1, 1, 1, 0), c(1, 0, 0, 1, 1, 1, 1, 1, 0, 
    0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 
    1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 
    1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 
    1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 
    0, 0, 1, 1, 1, 0), c(1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 
    1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 
    1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 
    0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 
    1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 
    1, 1, 0), c(1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 
    0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 
    1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 
    1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 
    0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0))), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

Upvotes: 1

Views: 41

Answers (2)

Jon Spring
Jon Spring

Reputation: 66500

First we can make a table of just the sales days, and count how many left in the qtr.

sales_days <- my_data[1,3] %>%   # Grab row 1, column 3
  tidyr::unnest() %>%
  rename(sales_day = sales_day_list) %>%
  mutate(day_num = row_number()) %>%
  arrange(-day_num) %>%
  mutate(remaining_sales = cumsum(sales_day)) %>%
  arrange(day_num)

Then we can attach this to my_data:

my_data2 <- my_data[,1:2] %>%
  left_join(sales_days)

head(my_data2)
# A tibble: 6 x 4
  DW_DATE_ID          day_num sales_day remaining_sales
  <dttm>                <int>     <dbl>           <dbl>
1 1993-04-01 00:00:00       1         1              64
2 1993-04-02 00:00:00       2         0              63
3 1993-04-03 00:00:00       3         0              63
4 1993-04-04 00:00:00       4         1              63
5 1993-04-05 00:00:00       5         1              62
6 1993-04-06 00:00:00       6         1              61

Upvotes: 1

Rich Pauloo
Rich Pauloo

Reputation: 8402

Upon inspection, the sales_day_list is the same for each row, which makes sense since the sales days shouldn't change throughout the quarter.

To solve this problem, get the day of the quarter and the sales days as vectors, then write a function to apply the sum across the sales days vector, subsetting it by the day of the quarter.

x <- my_data$day_num # days as vector
y <- my_data$sales_day_list[[1]] # can be any one of them
sapply(x, function(x){sum(y[x:length(y)])}) # desired output

You can assign the output of the last line back into your data frame.

Upvotes: 1

Related Questions