How to speed up a for-loop when order of execution matters?

Question

I have a table containing historical daily price data for different products and NA for future prices. I have a column for the expected price increase on a given future date for a product. The price increase is based off of prior day price.

I have constructed a for-loop to calculate the expected price for the products, but it runs very slowly for the ~500,000 records it is looping through.

All historical price data is in the table while all projected price is NA.

Example of current table (old_table):

date        product        price        incr_amt
====================================================
...          ...            ...         ...
10/14/19     prod1          50          1.0
10/15/19     prod1          50          1.0
10/16/19     prod1          NA          1.0
...          ...            ...         ...
04/01/20     prod1          NA          1.05
04/02/20     prod1          NA          1.0
...          ...            ...         ...
...          ...            ...         ...
10/14/19     prod2          35          1.0
10/15/19     prod2          35          1.0
10/16/19     prod2          NA          1.0
...          ...            ...         ...
01/01/20     prod2          NA          1.02
01/02/20     prod2          NA          1.0
...          ...            ...         ...

My current code groups by product, then if price is NA then calculate price as lagged price * increase_amt. Then recalculate lagged_price for next iteration. Loop through until all rows in table.

Example result (new_table):

date        product        price        incr_amt
====================================================
...          ...            ...         ...
10/14/19     prod1          50          1.0
10/15/19     prod1          50          1.0
10/16/19     prod1          50          1.0
...          ...            ...         ...
04/01/20     prod1          52.5        1.05
04/02/20     prod1          52.5        1.0
...          ...            ...         ...
...          ...            ...         ...
10/14/19     prod2          35          1.0
10/15/19     prod2          35          1.0
10/16/19     prod2          35          1.0
...          ...            ...         ...
01/01/20     prod2          35.7        1.02
01/02/20     prod2          35.7        1.0
...          ...            ...         ...

My current code works, but it takes over an hour to run. Because each iteration is dependent upon the previous and order matters, I don't know if there is a work around using a loop.

Current Code:

library(tidyverse)

old_table <- tribble(
  ~date, ~product, ~price, ~incr_amt,
  "2019-10-14", "prod1", 50, 1.0,
  "2019-10-15", "prod1", 50, 1.0,
  "2019-10-16", "prod1", NA, 1.0,
  "2019-10-17", "prod1", NA, 1.0,
  "2019-10-18", "prod1", NA, 1.0,
  "2019-10-19", "prod1", NA, 1.05,
  "2019-10-20", "prod1", NA, 1.0,
  "2019-10-21", "prod1", NA, 1.0,
  "2019-10-14", "prod2", 35, 1.0,
  "2019-10-15", "prod2", 35, 1.0,
  "2019-10-16", "prod2", NA, 1.0,
  "2019-10-17", "prod2", NA, 1.0,
  "2019-10-18", "prod2", NA, 1.0,
  "2019-10-19", "prod2", NA, 1.0,
  "2019-10-20", "prod2", NA, 1.0,
  "2019-10-21", "prod2", NA, 1.02,
  "2019-10-22", "prod2", NA, 1.0
)

new_table <- old_table %>%
  group_by(product) %>%
  mutate(lag_price = lag(price))

for (i in 1:nrow(new_table)) {
  if (!is.na(new_table$price[[i]]))
    next
  if (is.na(new_table$price[[i]])) {
    new_table$price[[i]] = new_table$lag_price[[i]] * new_table$incr_amt[[i]]
    new_table$lag_price <- lag(new_table$price)
  }

}

The code runs, but takes over an hour to loop through the ~500,000 records. How can I improve this process? Thanks.

How to speed up a for-loop when order of execution matters?

Answers (1)

Related Questions