Reputation: 155
I have a dataframe, lets call it df1, that looks something like this:
product_key month price productage
00020e32-8ecd53a64715 201508 65.00000 1
00020e32-8ecd53a64715 201509 65.00000 2
00020e32-8ecd53a64715 201510 65.00000 3
000340b8-60fb50bacac8 201504 55.00000 1
000340b8-60fb50bacac8 201505 55.00000 2
000340b8-60fb50bacac8 201506 53.16667 3
000340b8-60fb50bacac8 201507 27.50000 4
000340b8-60fb50bacac8 201508 27.50000 5
000340b8-60fb50bacac8 201509 27.50000 6
000340b8-60fb50bacac8 201510 27.50000 7
000458f1-9304a2fdb6ae 201506 49.00000 1
000458f1-9304a2fdb6ae 201507 49.00000 2
000458f1-9304a2fdb6ae 201508 49.00000 3
000458f1-9304a2fdb6ae 201509 49.00000 4
000458f1-9304a2fdb6ae 201510 49.00000 5
What I want to do is filter out all of the products that have been in the dataset for 1 month (e.g. filter(productage ==1)
) and then create a Unit Value Index from those items and their prices. Then I want to do the same for products that have been in the dataset for 2 months and then 3 months and so on...
What I have done so far, but is long-winded is:
MONTH 1
df1month1 <- df1 %>%
filter(productage == 1)
MONTHLY AVERAGES OF PRICES PER PRODUCT
df1_UVIMONTH1<-df1month1%>%
group_by(month)%>%
summarise(aveprice=mean(price))
UVI of MONTH 1, calculating UVI price indices
df1UVIMONTH1<-df1_UVIMONTH1%>%
mutate(month=as.numeric(month))%>%
arrange(month)%>%
mutate(UVI=(aveprice/lag(aveprice)))%>%
mutate(UVI=case_when(month==min(month)~1,
month!=min(month)~ UVI))%>%
mutate(chained=cumprod(UVI))
However, to do this for each product age within the dataset (there can be up to 26) and for 10 different datasets is long and tedious. I'm trying to make this process more efficient but am struggling.
I've attempted to create a function:
product_by_age <- function(df1, age){
filter_by_month <- df1 %>%
filter(productage %in% age) %>%
group_by(month) %>%
summarise(aveprice=mean(price))
UVI_index <- filter_by_month %>%
mutate(month=as.numeric(month))%>%
arrange(month)%>%
mutate(UVI=(aveprice/lag(aveprice)))%>%
mutate(UVI=case_when(month==min(month)~1,
month!=min(month)~ UVI))%>%
mutate(chained=cumprod(UVI))
}
df1productage <- data.frame(age = unique(df1$productage), stringsAsFactors = FALSE)
result <- data.frame()
for (i in df1productage:length(df1productage)) {
sba <- product_by_age(df1, df1productage[i])
result <- rbind(result, sba)
}
But it's not working for me. Please help! If anyone can think of a better way to approach this then let me know. I don't mind if you completely redo the function, either.
To recreate my sample dataset you can use:
product_key <- c(“00020e32-8ecd53a64715”, “00020e32-8ecd53a64715”, ”00020e32-8ecd53a64715”, “000340b8-60fb50bacac8”, “000340b8-60fb50bacac8”, “000340b8-60fb50bacac8”, “000340b8-60fb50bacac8”, “000340b8-60fb50bacac8”, “000340b8-60fb50bacac8”, “000340b8-60fb50bacac8”, “000458f1-9304a2fdb6ae”, “000458f1-9304a2fdb6ae”, “000458f1-9304a2fdb6ae”, “000458f1-9304a2fdb6ae”, ”000458f1-9304a2fdb6ae”)
month <- c("201508", "201509", "201510", "201504", "201505", "201506", "201507", "201508", "201509", "201510", "201506", "201507", "201508", "201509", "201510")
price <- c("65", "65", "65", "55", "55", "53.16667", "27.5", "27.5", "27.5", "27.5", "49", "49", "49", "49", "49")
productage <- c("1", "2", "3", "1", "2", "3", "4", "5", "6", "7", "1", "2", "3", "4", "5")
df1 <- data.frame(product_key, month, price, productage)
Upvotes: 2
Views: 41
Reputation: 887128
We need change the loop a bit. Assuming that we are looping through the sequence of rows in 'df1productage', and 'result' is initialized as a blank data.frame,
for(i in seq_len(nrow(df1productage))) {
result <- rbind(result, product_by_age(df1, df1productage$age[i]))
}
dim(result)
#[1] 15 4
Or using the tidyverse
way
library(tidyverse)
map_df(df1productage %>%
pull(age), ~
product_by_age(df1, .x), .id = 'grp')
# A tibble: 15 x 5
# grp month aveprice UVI chained
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 1 1 55 1 1
# 2 1 3 49 0.891 0.891
# 3 1 5 65 1.33 1.18
# 4 2 2 55 1 1
# 5 2 4 49 0.891 0.891
# 6 2 6 65 1.33 1.18
# 7 3 3 53.2 1 1
# 8 3 5 49 0.922 0.922
# 9 3 7 65 1.33 1.22
#10 4 4 27.5 1 1
#11 4 6 49 1.78 1.78
#12 5 5 27.5 1 1
#13 5 7 49 1.78 1.78
#14 6 6 27.5 1 1
#15 7 7 27.5 1 1
EDIT: Added an identifier column within map_df
Upvotes: 1
Reputation: 23747
It works with grouping, without new function!
require(dplyr)
df1%>%
group_by(month, productage)%>%
summarise(aveprice=mean(price)) %>% arrange(productage, month) %>%
group_by(productage)%>%
mutate(UVI=c(1, aveprice[2:length(aveprice)]/aveprice[1:length(aveprice)-1])) %>%
mutate(chained=cumprod(UVI))
### Group and then regroup. and I have modified your mutate code which was using 'lag'
# A tibble: 15 x 5
# Groups: productage [7]
month productage aveprice UVI chained
<dbl> <chr> <dbl> <dbl> <dbl>
1 201504 1 55.0 1.00 1.00
2 201506 1 49.0 0.891 0.891
3 201508 1 65.0 1.33 1.18
4 201505 2 55.0 1.00 1.00
5 201507 2 49.0 0.891 0.891
6 201509 2 65.0 1.33 1.18
7 201506 3 53.2 1.00 1.00
8 201508 3 49.0 0.922 0.922
9 201510 3 65.0 1.33 1.22
10 201507 4 27.5 1.00 1.00
11 201509 4 49.0 1.78 1.78
12 201508 5 27.5 1.00 1.00
13 201510 5 49.0 1.78 1.78
14 201509 6 27.5 1.00 1.00
15 201510 7 27.5 1.00 1.00
Now you can simply use split
in order to split by column productage
Upvotes: 1