Reputation: 611
I am looking to perform a calculation on a field in a dataframe with the following logic:
Assume that the first value is never NA so there would always be a seed value. I would wish to perform the calculation by groups of data (dplyr::group_by)
The following code gives a reprex:
basevalue <- c(2,5,NA,NA,NA,NA)
multiplier <- c(3.2,1.1,1.8,1.3,1.5,1.2)
previous_result <- c(NA,2,5,9,11.7,17.55)
result<- c(2,5,9,11.7,17.55,21.06)
logic <- c(rep("basevalue != NA, so take base value",2), rep("basevalue == NA, so take lag(result) * multiplier",4))
dfIn <- data.frame(basevalue,multiplier)
dfOut <- data.frame(basevalue,multiplier, result, previous_result, logic)
Is there a way to achieve this using simple dplyr / base R / tidyverse logic, or do I need to use a specialist package such as zoo?
Upvotes: 1
Views: 443
Reputation: 78590
You can do this with the accumulate2
function from purrr, which is designed for applying this kind of recursive relationship across two vectors.
library(dplyr)
library(purrr)
calculate <- function(previous, basevalue, multiplier) {
coalesce(basevalue, multiplier * previous)
}
dfIn %>%
mutate(lst = accumulate2(basevalue, multiplier[-1], calculate),
result = unlist(lst))
Two notes:
multiplier[-1]
throws away the first multiplier value, since accumulate
expects that to be one shorter than the first argument (notice that you'll never use the first multiplier value since there's no "previous" value at that point).accumulate2
is a list, so we add the unlist()
to turn it into a vector.Upvotes: 3
Reputation: 388907
Here is one way to do this with for
loop :
calculate_result <- function(b, m) {
r <- b
inds <- which(is.na(b))
for(i in inds) {
r[i] <- r[i-1] * m[i]
}
return(r)
}
Applying this function with dplyr
so that you can use group_by
later :
library(dplyr)
dfIn %>% mutate(result = calculate_result(basevalue, multiplier))
# basevalue multiplier result
#1 2 3.2 2.00
#2 5 1.1 5.00
#3 NA 1.8 9.00
#4 NA 1.3 11.70
#5 NA 1.5 17.55
#6 NA 1.2 21.06
Upvotes: 0