Brisbane Pom
Brisbane Pom

Reputation: 611

Iterative calculation in dplyr using result of previous calculation

I am looking to perform a calculation on a field in a dataframe with the following logic:

Assume that the first value is never NA so there would always be a seed value. I would wish to perform the calculation by groups of data (dplyr::group_by)

The following code gives a reprex:

basevalue <- c(2,5,NA,NA,NA,NA)      
multiplier <- c(3.2,1.1,1.8,1.3,1.5,1.2)
previous_result <- c(NA,2,5,9,11.7,17.55)
result<- c(2,5,9,11.7,17.55,21.06)
logic <- c(rep("basevalue != NA, so take base value",2), rep("basevalue == NA, so take lag(result) * multiplier",4))

dfIn <- data.frame(basevalue,multiplier)
dfOut <- data.frame(basevalue,multiplier, result, previous_result, logic)

Is there a way to achieve this using simple dplyr / base R / tidyverse logic, or do I need to use a specialist package such as zoo?

Upvotes: 1

Views: 443

Answers (2)

David Robinson
David Robinson

Reputation: 78590

You can do this with the accumulate2 function from purrr, which is designed for applying this kind of recursive relationship across two vectors.

library(dplyr)
library(purrr)

calculate <- function(previous, basevalue, multiplier) {
  coalesce(basevalue, multiplier * previous)
}

dfIn %>%
  mutate(lst = accumulate2(basevalue, multiplier[-1], calculate),
         result = unlist(lst))

Two notes:

  • The multiplier[-1] throws away the first multiplier value, since accumulate expects that to be one shorter than the first argument (notice that you'll never use the first multiplier value since there's no "previous" value at that point).
  • The result of accumulate2 is a list, so we add the unlist() to turn it into a vector.

Upvotes: 3

Ronak Shah
Ronak Shah

Reputation: 388907

Here is one way to do this with for loop :

calculate_result <- function(b, m) {
  r <- b
  inds <- which(is.na(b))
  for(i in inds) {
    r[i] <- r[i-1] * m[i]
  }
  return(r)
}

Applying this function with dplyr so that you can use group_by later :

library(dplyr)
dfIn %>% mutate(result = calculate_result(basevalue, multiplier))

#  basevalue multiplier result
#1         2        3.2   2.00
#2         5        1.1   5.00
#3        NA        1.8   9.00
#4        NA        1.3  11.70
#5        NA        1.5  17.55
#6        NA        1.2  21.06

Upvotes: 0

Related Questions