EW1982
EW1982

Reputation: 59

How Can I build a running tally of cumulative multiplied elements in r?

I am working with a large data frame and I need/want to use more efficient code.

This is my starting point:

library(data.table)
dt<-data.table(Customer = c("John","Sally","Michael","David"), 
Premium=c(1000,950,1125,1500),
Factor_1=1.2, 
Factor_2 =c(.98,.95,.9,.75),Factor_3=c(1,1.2,1.4,1.5))

This is the desired result (I want to create Premium_1, Premium_2, Premium_3):

Inefficient_code_answer<-dt%>%
  mutate(Premium_1 = Premium*Factor_1)%>%
  mutate(Premium_2 = Premium*Factor_1*Factor_2)%>%
  mutate(Premium_3 = Premium*Factor_1*Factor_2*Factor_3)

I tried to use purrr

dt%>%
mutate(Premium_3 = Premium * pmap_dbl(list(Factor_1:Factor_3),prod))

But list() does not play well with " : " sequences (Unless I just do not know how).

I have about 25 to 30 factors I need to apply to a base premium, and I need the premium values at each step. I currently have it all typed out in a script, but that is a nightmare to adjust when I want to add or take away a step (or factor).

Thanks

Upvotes: 1

Views: 95

Answers (4)

www
www

Reputation: 39174

We can consider to using for-loop in this case.

for (i in 1:3){
  if (i == 1){
    dt[["Premium_1"]] <- dt[["Premium"]] * dt[["Factor_1"]]
  } else {
    dt[[paste0("Premium_", i)]] <- dt[[paste0("Premium_", i - 1)]] * dt[[paste0("Factor_", i)]]
  }
}

dt
#    Customer Premium Factor_1 Factor_2 Factor_3 Premium_1 Premium_2 Premium_3
# 1:     John    1000      1.2     0.98      1.0      1200      1176    1176.0
# 2:    Sally     950      1.2     0.95      1.2      1140      1083    1299.6
# 3:  Michael    1125      1.2     0.90      1.4      1350      1215    1701.0
# 4:    David    1500      1.2     0.75      1.5      1800      1350    2025.0

Upvotes: 0

chinsoon12
chinsoon12

Reputation: 25223

Another option is to use Reduce:

cols <- grep("^Factor", names(dt), value=TRUE)
dt[, paste0("Premium_", seq_len(length(cols))) := 
        Reduce(`*`, .SD, accumulate=TRUE)[-1L], 
    .SDcols=c("Premium", cols)]

output:

   Customer Premium Factor_1 Factor_2 Factor_3 Premium_1 Premium_2 Premium_3
1:     John    1000      1.2     0.98      1.0      1200      1176    1176.0
2:    Sally     950      1.2     0.95      1.2      1140      1083    1299.6
3:  Michael    1125      1.2     0.90      1.4      1350      1215    1701.0
4:    David    1500      1.2     0.75      1.5      1800      1350    2025.0

Upvotes: 1

A. Suliman
A. Suliman

Reputation: 13135

Here is one option using map_dfc and transmute. Create the column name using !! and quo_name. Create the product using paste0("Premium*", paste0("Factor_",1:.x, collapse = "*")) where .x can take 1,2,...etc, then eval and parse.

library(dplyr)
library(purrr)
res <- map_dfc(1:3, ~dt %>%
                     transmute(!!quo_name(paste0('Premium_',.x)) := 
                     eval(parse(text = paste0("Premium*", paste0("Factor_",1:.x, collapse = "*")))))) 

bind_cols(dt,res) 

   Customer Premium Factor_1 Factor_2 Factor_3 Premium_1 Premium_2 Premium_3
1:     John    1000      1.2     0.98      1.0      1200      1176    1176.0
2:    Sally     950      1.2     0.95      1.2      1140      1083    1299.6
3:  Michael    1125      1.2     0.90      1.4      1350      1215    1701.0
4:    David    1500      1.2     0.75      1.5      1800      1350    2025.0

Upvotes: 1

Marius
Marius

Reputation: 60210

Calculations in R (and particularly dplyr) are often easier when you're working down columns rather than across rows. So you can first convert your data to long format and then use cumprod():

dt_long <- dt %>%
    pivot_longer(
        cols = Factor_1:Factor_3,
        names_to = "Factor",
        names_prefix = "Factor_",
        values_to = "Value"
    )

dt_long <- dt_long %>%
    group_by(Customer) %>%
    mutate(Result = Premium * cumprod(Value))

Output:

# A tibble: 12 x 5
# Groups:   Customer [4]
   Customer Premium Factor Value Result
   <chr>      <dbl> <chr>  <dbl>  <dbl>
 1 John        1000 1       1.2   1200 
 2 John        1000 2       0.98  1176 
 3 John        1000 3       1     1176 
 4 Sally        950 1       1.2   1140 
 5 Sally        950 2       0.95  1083 
 6 Sally        950 3       1.2   1300.
 7 Michael     1125 1       1.2   1350 
 8 Michael     1125 2       0.9   1215 
 9 Michael     1125 3       1.4   1701.
10 David       1500 1       1.2   1800 
11 David       1500 2       0.75  1350.
12 David       1500 3       1.5   2025.

Upvotes: 2

Related Questions