Reputation: 31
:) Hi all
yes, I know there are already some tips and tricks out there how to efficiently build a dataframe row by row, but my for loop is still way to slow. Maybe it's obvious for one of you how to speed it up/what's the reason why it's so slow.
As you can see I already converted it to a list, following the instructions of: Creating an R dataframe row-by-row but this does not make it much faster than rbind is.
index = 1
for (i in 1:nrow(predictionDf)) {
startDate = predictionDf$ApropYMD[i]
amountPerMonth = (predictionDf$PredictionExp[i]/12)
amountPerMonthMax = (predictionDf$PredictionMax[i]/12)
print(i)
for (j in 1:12) {
plotDf[index, ] = list(ApropYMD=startDate, AmountExp = amountPerMonth, AmountMax = amountPerMonthMax)
month(startDate) <- month(startDate) + 1
index = index + 1
}
}
I also tried this, which is a little faster
plotDf = data.frame("ApropYMD" = c(seq(firstDayNextMonth, highestDate, by="months")))
plotDf$AmountExp = 0
plotDf$AmountMax = 0
for (i in 1:nrow(tmpPredictionDf)) {
startDate = tmpPredictionDf$ApropYMD[i]
amountPerMonth = (tmpPredictionDf$PredictionExp[i]/12)
amountPerMonthMax = (tmpPredictionDf$PredictionMax[i]/12)
print(i)
for (j in 1:12) {
plotDf$AmountExp[which(plotDf$ApropYMD == startDate)] = plotDf$AmountExp[which(plotDf$ApropYMD == startDate)] + amountPerMonth
plotDf$AmountMax[which(plotDf$ApropYMD == startDate)] = plotDf$AmountMax[which(plotDf$ApropYMD == startDate)] + amountPerMonthMax
month(startDate) <- month(startDate) + 1
}
I want to spread the Amount of one date over the next 12 months for each of the ~5.500 rows -> 5.500*12 = 66.000 row inserts
Input
ApropYMD AmountExp AmountMax
2019-01-01 6000 12000
Output
ApropYMD AmountExp AmountMax
2019-01-01 500 1000
2019-01-02 500 1000
2019-01-03 500 1000
2019-01-04 500 1000
2019-01-05 500 1000
2019-01-06 500 1000
2019-01-07 500 1000
2019-01-08 500 1000
2019-01-09 500 1000
2019-01-10 500 1000
2019-01-11 500 1000
2019-01-12 500 1000
Upvotes: 3
Views: 200
Reputation: 887223
We can do this without any grouping. Do the division at the beginning and then use map
to expand the 'ApropYMD' and unnest
library(tidyverse)
df %>%
mutate_at(2:3, list(~ ./ 12)) %>%
mutate(ApropYMD = map(ApropYMD, seq, length.out = 12, by = "month")) %>%
unnest
# AmountExp AmountMax ApropYMD
#1 500 1000 2019-01-01
#2 500 1000 2019-02-01
#3 500 1000 2019-03-01
#4 500 1000 2019-04-01
#5 500 1000 2019-05-01
#6 500 1000 2019-06-01
#7 500 1000 2019-07-01
#8 500 1000 2019-08-01
#9 500 1000 2019-09-01
#10 500 1000 2019-10-01
#11 500 1000 2019-11-01
#12 500 1000 2019-12-01
#13 1000 2000 2018-01-01
#14 1000 2000 2018-02-01
#15 1000 2000 2018-03-01
#16 1000 2000 2018-04-01
#17 1000 2000 2018-05-01
#18 1000 2000 2018-06-01
#19 1000 2000 2018-07-01
#20 1000 2000 2018-08-01
#21 1000 2000 2018-09-01
#22 1000 2000 2018-10-01
#23 1000 2000 2018-11-01
#24 1000 2000 2018-12-01
df <- structure(list(ApropYMD = structure(c(17897, 17532), class = "Date"),
AmountExp = c(6000L, 12000L), AmountMax = c(12000L, 24000L
)), row.names = c(NA, -2L), class = "data.frame")
Upvotes: 1
Reputation: 389047
Populating dataframe in a loop is not recommended and it is usually slow. We can use pre-existing functions to do that. For example, one way to do this with tidyr::complete
. Assuming you have ApropYMD
of "Date" class we can expand every row to 12 rows by grouping every row and creating a sequence of date for every month and then divide the first
value of AmountExp
and AmountMax
by 12 so that it is equally distributed.
library(dplyr)
library(tidyr)
df %>%
group_by(group = row_number()) %>%
complete(ApropYMD = seq(ApropYMD, length.out = 12, by = "month"),
fill = list(AmountExp = 0, AmountMax = 0)) %>%
mutate(AmountExp = first(AmountExp)/12,
AmountMax = first(AmountMax)/12) %>%
ungroup() %>%
select(-group)
# A tibble: 24 x 3
# ApropYMD AmountExp AmountMax
# <date> <dbl> <dbl>
# 1 2019-01-01 500 1000
# 2 2019-02-01 500 1000
# 3 2019-03-01 500 1000
# 4 2019-04-01 500 1000
# 5 2019-05-01 500 1000
# 6 2019-06-01 500 1000
# 7 2019-07-01 500 1000
# 8 2019-08-01 500 1000
# 9 2019-09-01 500 1000
#10 2019-10-01 500 1000
# … with 14 more rows
data
Added one more additional row to test the solution
df <- read.table(text = "ApropYMD AmountExp AmountMax
2019-01-01 6000 12000
2018-01-01 12000 24000", header = T)
df$ApropYMD <- as.Date(df$ApropYMD)
Upvotes: 2