Reputation: 85
The following code does exactly what I want; however, the for loop is far too slow. On my machine, the wall time for the for loop is 1min 5s. I'm looking for an alternative to the for loop that is much faster.
# Imports
from sympy.solvers.solveset import solveset_real
from sympy import Symbol, Eq
# Define variables
initial_value = 1
rate = Symbol('r')
decay_obs_window = 1480346
target_decay = .15
# Solver to calculate decay rate
decay_rate = solveset_real(Eq((initial_value - rate * decay_obs_window), target_decay), rate).args[0]
# Generate weights
weights = []
for i in range(5723673):
# How to handle data BEYOND decay_obs_window
if i > decay_obs_window and target_decay == 0:
# Record a weight of zero
weights.append(0)
elif i > decay_obs_window and target_decay > 0:
# Record the final target weight
weights.append(decayed_weight)
# How to handle data WITHIN decay_obs_window
else:
# Calculate the new slightly decayed weight
decayed_weight = 1 - (decay_rate * i)
weights.append(decayed_weight)
weights[0:10]
I wrote this list comprehension with the hope of improving the execution time. While it works perfectly, it does not yield any appreciable runtime improvement over the for loop 😞:
weights = [0 if i > decay_obs_window and target_decay == 0 else decayed_weight if i > decay_obs_window and target_decay > 0 else (decayed_weight := 1 - (decay_rate * i)) for i in range(len(weights_df))]
I'm interested in any approaches that would help speed this up. Thank you 🙏!
FINAL SOLUTION:
This was the final solution that I settled on. On my machine, the wall time to execute the entire thing is only 425 ms. It's a slightly modified version of Aaron's proposed solution.
import numpy as np
from sympy.solvers.solveset import solveset_real
from sympy import Symbol, Eq
# Define variables
initial_value = 1
rate = Symbol('r')
decay_obs_window = 1480346
target_decay = .15
# Instantiate weights array
weights = np.zeros(5723673)
# Solver to calculate decay rate
decay_rate = solveset_real(Eq((initial_value - rate * decay_obs_window), target_decay), rate).args[0]
# Fix a bug where numpy doesn't like sympy floats :(
decay_rate = float(decay_rate)
# How to weight observations WITHIN decay_obs_window
weights[:decay_obs_window + 1] = 1 - np.arange(decay_obs_window + 1) * decay_rate
# How to weight observations BEYOND decay_obs_window
weights[decay_obs_window + 1 : 5723673] = target_decay
weights
Upvotes: 1
Views: 1416
Reputation: 1136
I have what I believe to be an absolutely optimal method here:
# Imports
from sympy.solvers.solveset import solveset_real
from sympy import Symbol, Eq
# Define variables
initial_value = 1
rate = Symbol('r')
decay_obs_window = 1480346
target_decay = .15
# Solver to calculate decay rate
decay_rate = solveset_real(Eq((initial_value - rate * decay_obs_window), target_decay), rate).args[0]
# Generate weights
wLength = 5723673
weights = [1 - (decay_rate * i) for i in range(decay_obs_window + 1)]
extend_length = wLength - decay_obs_window - 1
if target_decay == 0:
weights.extend(0 for _ in range(extend_length))
elif target_decay > 0:
decayed_weight = weights[-1]
weights.extend(decayed_weight for _ in range(extend_length))
This brings all the branching logic out of the loop, so it's only calculated once instead of ~ 1.5 million times.
That said, this still represents nearly no improvement in speed over what you already have. The fact of the matter is that most of your time is spent calculating 1 - (decay_rate * i)
, and there's nothing you can do in python to speed this up.
If you really need more performance you're probably at the point of needing to figure out how to call a C
(or Rust
) library.
Numpy is perfect for this. We can use the fromfunction method to create an array. First import the function:
from numpy import fromfunction
Then replace
weights = [1 - (decay_rate * i) for i in range(decay_obs_window + 1)]
with
weights = fromfunction(lambda i: 1 - (decay_rate * i), (decay_obs_window + 1, )).tolist()
This is likely to represent the fastest you can do this in Python.
Upvotes: 0
Reputation: 11075
TLDR; None of the variables you test against in your if statements ever change during the loop, so you can easily kick the conditional logic out of the loop, and decide beforehand. I also am a huge proponent of numpy and vectorization.
Looking at the logic there aren't too many possible outcomes of what weights
ends up looking like. As RufusVS mentioned, you can separate out the first section where no additional logic is being calculated. It is also a simple linear function, so why not compute it with numpy
which is great for linear algebra:
import numpy as np
weights = np.zeros(5723673)
#Fix a bug where numpy doesn't like sympy floats :(
decay_rate = float(decay_rate)
weights[:decay_obs_window + 1] = 1 - np.arange(decay_obs_window + 1) * decay_rate
Then you can decide what to do with the remaining values based on the value of target_decay
outside of any loops because it never changes:
if target_decay == 0:
pass #weights array started out filled with 0's so we don't need to do anything
elif target_decay > 0:
#fill the rest of the array with the last value of the window
weights[decay_obs_window + 1 : 5723673] = weights[decay_obs_window + 1]
pass
else: #target_decay < 0:
#continue calculating the linear function
weights[decay_obs_window + 1 : 5723673] = 1 - np.arange(decay_obs_window + 1, 5723673) * decay_rate
Upvotes: 1
Reputation: 4137
By breaking it into two loops, you eliminate a lot of comparison to the break point:
# Imports
from sympy.solvers.solveset import solveset_real
from sympy import Symbol, Eq
# Define variables
initial_value = 1
rate = Symbol('r')
decay_obs_window = 1480346
target_decay = .15
# Solver to calculate decay rate
decay_rate = solveset_real(Eq((initial_value - rate * decay_obs_window), target_decay), rate).args[0]
# Generate weights
weights = []
for i in range(decay_obs_window+1):
# Calculate the new slightly decayed weight
decayed_weight = 1 - (decay_rate * i)
weights.append(decayed_weight)
for i in range(decay_obs_window+1, 5723673):
# How to handle data BEYOND decay_obs_window
if target_decay == 0:
weights.append(0)
elif target_decay > 0:
# Record the final target weight
weights.append(decayed_weight)
else:
# Calculate the new slightly decayed weight
decayed_weight = 1 - (decay_rate * i)
weights.append(decayed_weight)
weights[0:10]
Modified to include @MarkSouls comment, and my own further observation:
# Imports
from sympy.solvers.solveset import solveset_real
from sympy import Symbol, Eq
# Define variables
initial_value = 1
rate = Symbol('r')
decay_obs_window = 1480346
target_decay = .15
# Solver to calculate decay rate
decay_rate = solveset_real(Eq((initial_value - rate * decay_obs_window), target_decay), rate).args[0]
TOTAL_ENTRIES = 5723673
# Generate weights
weights = [0]* TOTAL_ENTRIES
for i in range(decay_obs_window+1):
# Calculate the new slightly decayed weight
decayed_weight = 1 - (decay_rate * i)
weights[i]=decayed_weight
if target_decay == 0:
pass
elif target_decay > 0:
for i in range(decay_obs_window+1, TOTAL_ENTRIES):
# Record the final target weight
weights[i]=decayed_weight
else:
for i in range(decay_obs_window+1, TOTAL_ENTRIES):
decayed_weight = 1 - (decay_rate * i)
weights[i]=decayed_weight
weights[0:10]
Upvotes: 0