Reputation: 59
I am working with a large data frame and I need/want to use more efficient code.
This is my starting point:
library(data.table)
dt<-data.table(Customer = c("John","Sally","Michael","David"),
Premium=c(1000,950,1125,1500),
Factor_1=1.2,
Factor_2 =c(.98,.95,.9,.75),Factor_3=c(1,1.2,1.4,1.5))
This is the desired result (I want to create Premium_1, Premium_2, Premium_3):
Inefficient_code_answer<-dt%>%
mutate(Premium_1 = Premium*Factor_1)%>%
mutate(Premium_2 = Premium*Factor_1*Factor_2)%>%
mutate(Premium_3 = Premium*Factor_1*Factor_2*Factor_3)
I tried to use purrr
dt%>%
mutate(Premium_3 = Premium * pmap_dbl(list(Factor_1:Factor_3),prod))
But list() does not play well with " : " sequences (Unless I just do not know how).
I have about 25 to 30 factors I need to apply to a base premium, and I need the premium values at each step. I currently have it all typed out in a script, but that is a nightmare to adjust when I want to add or take away a step (or factor).
Thanks
Upvotes: 1
Views: 95
Reputation: 39174
We can consider to using for-loop
in this case.
for (i in 1:3){
if (i == 1){
dt[["Premium_1"]] <- dt[["Premium"]] * dt[["Factor_1"]]
} else {
dt[[paste0("Premium_", i)]] <- dt[[paste0("Premium_", i - 1)]] * dt[[paste0("Factor_", i)]]
}
}
dt
# Customer Premium Factor_1 Factor_2 Factor_3 Premium_1 Premium_2 Premium_3
# 1: John 1000 1.2 0.98 1.0 1200 1176 1176.0
# 2: Sally 950 1.2 0.95 1.2 1140 1083 1299.6
# 3: Michael 1125 1.2 0.90 1.4 1350 1215 1701.0
# 4: David 1500 1.2 0.75 1.5 1800 1350 2025.0
Upvotes: 0
Reputation: 25223
Another option is to use Reduce
:
cols <- grep("^Factor", names(dt), value=TRUE)
dt[, paste0("Premium_", seq_len(length(cols))) :=
Reduce(`*`, .SD, accumulate=TRUE)[-1L],
.SDcols=c("Premium", cols)]
output:
Customer Premium Factor_1 Factor_2 Factor_3 Premium_1 Premium_2 Premium_3
1: John 1000 1.2 0.98 1.0 1200 1176 1176.0
2: Sally 950 1.2 0.95 1.2 1140 1083 1299.6
3: Michael 1125 1.2 0.90 1.4 1350 1215 1701.0
4: David 1500 1.2 0.75 1.5 1800 1350 2025.0
Upvotes: 1
Reputation: 13135
Here is one option using map_dfc
and transmute
. Create the column name using !!
and quo_name
. Create the product using paste0("Premium*", paste0("Factor_",1:.x, collapse = "*"))
where .x
can take 1,2,...etc, then eval
and parse
.
library(dplyr)
library(purrr)
res <- map_dfc(1:3, ~dt %>%
transmute(!!quo_name(paste0('Premium_',.x)) :=
eval(parse(text = paste0("Premium*", paste0("Factor_",1:.x, collapse = "*"))))))
bind_cols(dt,res)
Customer Premium Factor_1 Factor_2 Factor_3 Premium_1 Premium_2 Premium_3
1: John 1000 1.2 0.98 1.0 1200 1176 1176.0
2: Sally 950 1.2 0.95 1.2 1140 1083 1299.6
3: Michael 1125 1.2 0.90 1.4 1350 1215 1701.0
4: David 1500 1.2 0.75 1.5 1800 1350 2025.0
Upvotes: 1
Reputation: 60210
Calculations in R (and particularly dplyr
) are often easier when you're working down columns rather than across rows. So you can first convert your data to long format and then use cumprod()
:
dt_long <- dt %>%
pivot_longer(
cols = Factor_1:Factor_3,
names_to = "Factor",
names_prefix = "Factor_",
values_to = "Value"
)
dt_long <- dt_long %>%
group_by(Customer) %>%
mutate(Result = Premium * cumprod(Value))
Output:
# A tibble: 12 x 5
# Groups: Customer [4]
Customer Premium Factor Value Result
<chr> <dbl> <chr> <dbl> <dbl>
1 John 1000 1 1.2 1200
2 John 1000 2 0.98 1176
3 John 1000 3 1 1176
4 Sally 950 1 1.2 1140
5 Sally 950 2 0.95 1083
6 Sally 950 3 1.2 1300.
7 Michael 1125 1 1.2 1350
8 Michael 1125 2 0.9 1215
9 Michael 1125 3 1.4 1701.
10 David 1500 1 1.2 1800
11 David 1500 2 0.75 1350.
12 David 1500 3 1.5 2025.
Upvotes: 2