Rene Chan
Rene Chan

Reputation: 985

Apply custom function to dataframe

Need help on something not too complex, but new to me. I have a dataframe df with a column Product.id, and a price Price.

Product.id  price
A   11.5
A   11.5
A   12
A   13
A   13
B   9.25
B   9.75
B   9.75
B   9.5

I would like to check if the price has changed from previous month using a custom function:

Check.Price.Change <- function(Vector){
  for(x in 1:nrow(Vector)){
    if(Vector[x] != Vector[x-1]){
      TRUE 
    }
  }
}

check if bucket has change from previous month

df <- df %>%
  group_by(Product.id) %>%
  mutate(if.Price.change = lapply(Price, Check.Price.Change))

I get the error :

Error in 1:nrow(Vector) : argument of length 0
Called from: FUN(X[[i]], ...)

What would be the right way to to please ?

Upvotes: 0

Views: 58

Answers (2)

smingerson
smingerson

Reputation: 1438

The code below will add an indicator column if the previous Price matches the current row's price. lag (and lead) are dplyr functions which let you make comparisons between a column's values in different rows efficiently. The vectorized if_else, also from dplyr, will make the value if.Price.change TRUE if the condition is met, FALSE, if not, and NA if it can't make the comparison. Note that it won't be able to make the comparison for the first row, because there is no previous row to pull a value from. As a side note, lag/lead let's use compare multiple rows forward or back, the default is just 1.

Using dplyr:

df <- df %>% group_by(Product.id) %>%
              mutate(if.Price.change = if_else(lag(Price) == Price, TRUE, FALSE, NA) %>% ungroup
# A tibble: 9 x 3
#  Product.id Price if.Price.change
#  <fct>      <dbl> <lgl>          
#1 A          11.5  NA             
#2 A          11.5  TRUE           
#3 A          12    FALSE          
#4 A          13    FALSE          
#5 A          13    TRUE           
#6 B           9.25 NA             
#7 B           9.75 FALSE          
#8 B           9.75 TRUE           
#9 B           9.5  FALSE     

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388982

We can use lag in dplyr to compare with previous entry.

library(dplyr)
df %>% group_by(Product.id) %>%  mutate(is_changed = price != lag(price))

# Product.id price is_changed
#  <fct>      <dbl> <lgl>     
#1 A          11.5  NA        
#2 A          11.5  FALSE     
#3 A          12    TRUE      
#4 A          13    TRUE      
#5 A          13    FALSE     
#6 B           9.25 NA        
#7 B           9.75 TRUE      
#8 B           9.75 FALSE     
#9 B           9.5  TRUE      

Similarly, there is shift function in data.table whose default type is "lag"

library(data.table)
setDT(df)[, is_changed := price != shift(price), by = Product.id]

data

df <- structure(list(Product.id = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), price = c(11.5, 
11.5, 12, 13, 13, 9.25, 9.75, 9.75, 9.5)), class = "data.frame", 
row.names = c(NA, -9L))

Upvotes: 1

Related Questions