Rnovice
Rnovice

Reputation: 353

Create weekly cumulative totals from R dataframe

I have a dataframe that has launch weeks for products across markets. Here is a snapshot of the dataframe.

    Prod_ID        Market_Name         START_WEEK
    11044913000    PHOENIX, AZ         1397
    11044913000    WEST TEX/NEW MEX    1206
    11159402003    PORTLAND,OR         1188
    11159402003    SEATTLE/TACOMA      1188
    11159402003    SPOKANE             1195
    11159410010    PORTLAND,OR         1186
    11159410010    SALT LAKE CITY      1190
    11159410010    SEATTLE/TACOMA      1186
    11159410010    SPOKANE             1187
    11159410010    WEST TEX/NEW MEX    1197
    11159410014    PORTLAND,OR         1198
    11159410014    SEATTLE/TACOMA      1239

I would like to create another dataframe which will give me for each Prod_ID, cumulative totals of number of markets a product has been launched in on a weekly basis for first 6 weeks. For the above snippet of data, the output should like something like this.

    Prod_ID        Week1    Week2    Week3    Week4    Week5    Week6
    11044913000    1        1        1        1        1        1
    11159402003    2        2        2        2        2        2
    11159410010    2        3        3        3        4        4
    11159410014    1        1        1        1        1        1

For ease of displaying, I have shown the output only till Week 6, but I need to track till Week 12 for my need. Week is denoted by a 4 digit number in my dataset and is not in date format. Please note that not all products have the same starting week, so I need to infer the earliest week for a Prod_IDfrom the START_WEEK variable. And then identify the next 6 weeks to generate the total number of markets launched in each week.

Any help to do this is appreciated.

Upvotes: 0

Views: 413

Answers (1)

hpesoj626
hpesoj626

Reputation: 3629

I think I understand your problem. Here is my shot. There are several phases to this solution.

The first step is to calculate the cumulative sum of markets for the weeks and the week number for each Prod_ID since they opened. This is done with the following code chunk.

df1 <- df %>%
  group_by(Prod_ID, START_WEEK) %>%
  count() %>%
  arrange(Prod_ID, START_WEEK) %>%
  ungroup() %>%
  group_by(Prod_ID) %>%
  mutate(tot_market = cumsum(n)) %>%
  ungroup() %>%
  group_by(Prod_ID) %>%
  mutate(min_START_WEEK = min(START_WEEK)) %>%
  mutate(week = START_WEEK - min_START_WEEK + 1) 
df1

# # A tibble: 10 x 6
# # Groups:   Prod_ID [4]
#         Prod_ID START_WEEK     n tot_market min_START_WEEK  week
#           <dbl>      <int> <int>      <int>          <dbl> <dbl>
#  1 11044913000.       1206     1          1          1206.    1.
#  2 11044913000.       1397     1          2          1206.  192.
#  3 11159402003.       1188     2          2          1188.    1.
#  4 11159402003.       1195     1          3          1188.    8.
#  5 11159410010.       1186     2          2          1186.    1.
#  6 11159410010.       1187     1          3          1186.    2.
#  7 11159410010.       1190     1          4          1186.    5.
#  8 11159410010.       1197     1          5          1186.   12.
#  9 11159410014.       1198     1          1          1198.    1.
# 10 11159410014.       1239     1          2          1198.   42.

The second phase is to expand the week and Prod_ID to the maximum number of weeks in week.

df2 <- expand.grid(min(df1$week):max(df1$week), unique(df1$Prod_ID))
colnames(df2) <- c("week", "Prod_ID")

The third phase is done by merging df1 and df2 and using zoo::locf to fill the NA's in tot_market (total market) by Prod_ID with the preceding value.

df2 %>% left_join(df1) %>% select(-START_WEEK, -n, -min_START_WEEK) %>%
  group_by(Prod_ID) %>%
  arrange(Prod_ID, week) %>%
  mutate(tot_market = zoo::na.locf(tot_market)) %>%
  spread(week, tot_market) %>%
  ungroup() %>%
  mutate_at(vars(Prod_ID), as.character) %>%
  rename_if(is.integer, function(x) paste0("Week", x))

# # A tibble: 4 x 193
#   Prod_ID     Week1 Week2 Week3 Week4 Week5 Week6 Week7 Week8 Week9 Week10 Week11
#   <chr>       <int> <int> <int> <int> <int> <int> <int> <int> <int>  <int>  <int>
# 1 11044913000     1     1     1     1     1     1     1     1     1      1      1
# 2 11159402003     2     2     2     2     2     2     2     3     3      3      3
# 3 11159410010     2     3     3     3     4     4     4     4     4      4      4
# 4 11159410014     1     1     1     1     1     1     1     1     1      1      1
# # ... with 181 more variables

Upvotes: 1

Related Questions