Jay J
Jay J

Reputation: 155

Using dplyr in a function in R, and then a for loop to carry out the function

I have a dataframe, lets call it df1, that looks something like this:

product_key              month    price     productage

00020e32-8ecd53a64715   201508  65.00000    1
00020e32-8ecd53a64715   201509  65.00000    2
00020e32-8ecd53a64715   201510  65.00000    3
000340b8-60fb50bacac8   201504  55.00000    1
000340b8-60fb50bacac8   201505  55.00000    2
000340b8-60fb50bacac8   201506  53.16667    3
000340b8-60fb50bacac8   201507  27.50000    4
000340b8-60fb50bacac8   201508  27.50000    5
000340b8-60fb50bacac8   201509  27.50000    6
000340b8-60fb50bacac8   201510  27.50000    7
000458f1-9304a2fdb6ae   201506  49.00000    1
000458f1-9304a2fdb6ae   201507  49.00000    2
000458f1-9304a2fdb6ae   201508  49.00000    3
000458f1-9304a2fdb6ae   201509  49.00000    4
000458f1-9304a2fdb6ae   201510  49.00000    5

What I want to do is filter out all of the products that have been in the dataset for 1 month (e.g. filter(productage ==1)) and then create a Unit Value Index from those items and their prices. Then I want to do the same for products that have been in the dataset for 2 months and then 3 months and so on...

What I have done so far, but is long-winded is:

MONTH 1

df1month1 <- df1 %>%
filter(productage == 1)

MONTHLY AVERAGES OF PRICES PER PRODUCT

df1_UVIMONTH1<-df1month1%>%
  group_by(month)%>%
  summarise(aveprice=mean(price))

UVI of MONTH 1, calculating UVI price indices

  df1UVIMONTH1<-df1_UVIMONTH1%>%
  mutate(month=as.numeric(month))%>%
  arrange(month)%>%
  mutate(UVI=(aveprice/lag(aveprice)))%>%
  mutate(UVI=case_when(month==min(month)~1,
                       month!=min(month)~ UVI))%>%
  mutate(chained=cumprod(UVI))

However, to do this for each product age within the dataset (there can be up to 26) and for 10 different datasets is long and tedious. I'm trying to make this process more efficient but am struggling.

I've attempted to create a function:

product_by_age <- function(df1, age){
  filter_by_month <- df1 %>%
    filter(productage %in% age) %>%
    group_by(month) %>%
    summarise(aveprice=mean(price))

  UVI_index <- filter_by_month %>%
    mutate(month=as.numeric(month))%>%
    arrange(month)%>%
    mutate(UVI=(aveprice/lag(aveprice)))%>%
    mutate(UVI=case_when(month==min(month)~1,
                         month!=min(month)~ UVI))%>%
    mutate(chained=cumprod(UVI))
}



df1productage <- data.frame(age = unique(df1$productage), stringsAsFactors = FALSE)

result <- data.frame()
for (i in df1productage:length(df1productage)) {
  sba <- product_by_age(df1, df1productage[i])
  result <- rbind(result, sba)
}

But it's not working for me. Please help! If anyone can think of a better way to approach this then let me know. I don't mind if you completely redo the function, either.

To recreate my sample dataset you can use:

product_key <- c(“00020e32-8ecd53a64715”, “00020e32-8ecd53a64715”, ”00020e32-8ecd53a64715”, “000340b8-60fb50bacac8”, “000340b8-60fb50bacac8”, “000340b8-60fb50bacac8”, “000340b8-60fb50bacac8”, “000340b8-60fb50bacac8”, “000340b8-60fb50bacac8”,   “000340b8-60fb50bacac8”, “000458f1-9304a2fdb6ae”, “000458f1-9304a2fdb6ae”, “000458f1-9304a2fdb6ae”, “000458f1-9304a2fdb6ae”, ”000458f1-9304a2fdb6ae”)
month <- c("201508", "201509", "201510", "201504", "201505", "201506", "201507", "201508", "201509", "201510", "201506", "201507", "201508", "201509", "201510")
price <- c("65", "65", "65", "55", "55", "53.16667", "27.5", "27.5", "27.5", "27.5", "49", "49", "49", "49", "49")
productage <- c("1", "2", "3", "1", "2", "3", "4", "5", "6", "7", "1", "2", "3", "4", "5")
df1 <- data.frame(product_key, month, price, productage)

Upvotes: 2

Views: 41

Answers (2)

akrun
akrun

Reputation: 887128

We need change the loop a bit. Assuming that we are looping through the sequence of rows in 'df1productage', and 'result' is initialized as a blank data.frame,

for(i in seq_len(nrow(df1productage))) {
    result <- rbind(result, product_by_age(df1, df1productage$age[i]))
 }

dim(result)
#[1] 15  4

Or using the tidyverse way

library(tidyverse)
map_df(df1productage %>% 
              pull(age), ~    
                    product_by_age(df1, .x), .id = 'grp')
# A tibble: 15 x 5
#   grp   month aveprice   UVI chained
#   <chr> <dbl>    <dbl> <dbl>   <dbl>
# 1 1         1     55   1       1    
# 2 1         3     49   0.891   0.891
# 3 1         5     65   1.33    1.18 
# 4 2         2     55   1       1    
# 5 2         4     49   0.891   0.891
# 6 2         6     65   1.33    1.18 
# 7 3         3     53.2 1       1    
# 8 3         5     49   0.922   0.922
# 9 3         7     65   1.33    1.22 
#10 4         4     27.5 1       1    
#11 4         6     49   1.78    1.78 
#12 5         5     27.5 1       1    
#13 5         7     49   1.78    1.78 
#14 6         6     27.5 1       1    
#15 7         7     27.5 1       1    

EDIT: Added an identifier column within map_df

Upvotes: 1

tjebo
tjebo

Reputation: 23747

It works with grouping, without new function!

require(dplyr)

df1%>%
  group_by(month, productage)%>%
  summarise(aveprice=mean(price)) %>% arrange(productage, month) %>%
    group_by(productage)%>%
    mutate(UVI=c(1, aveprice[2:length(aveprice)]/aveprice[1:length(aveprice)-1])) %>%
  mutate(chained=cumprod(UVI))

 ### Group and then regroup. and I have modified your mutate code which was using 'lag' 

# A tibble: 15 x 5
# Groups:   productage [7]
    month productage aveprice   UVI chained
    <dbl> <chr>         <dbl> <dbl>   <dbl>
 1 201504 1              55.0 1.00    1.00 
 2 201506 1              49.0 0.891   0.891
 3 201508 1              65.0 1.33    1.18 
 4 201505 2              55.0 1.00    1.00 
 5 201507 2              49.0 0.891   0.891
 6 201509 2              65.0 1.33    1.18 
 7 201506 3              53.2 1.00    1.00 
 8 201508 3              49.0 0.922   0.922
 9 201510 3              65.0 1.33    1.22 
10 201507 4              27.5 1.00    1.00 
11 201509 4              49.0 1.78    1.78 
12 201508 5              27.5 1.00    1.00 
13 201510 5              49.0 1.78    1.78 
14 201509 6              27.5 1.00    1.00 
15 201510 7              27.5 1.00    1.00 

Now you can simply use split in order to split by column productage

Upvotes: 1

Related Questions