matteblack
matteblack

Reputation: 85

Alternative Faster Approach to Python For Loop

The following code does exactly what I want; however, the for loop is far too slow. On my machine, the wall time for the for loop is 1min 5s. I'm looking for an alternative to the for loop that is much faster.

# Imports
from sympy.solvers.solveset import solveset_real
from sympy import Symbol, Eq

# Define variables
initial_value = 1
rate = Symbol('r')
decay_obs_window = 1480346
target_decay = .15

# Solver to calculate decay rate
decay_rate = solveset_real(Eq((initial_value - rate * decay_obs_window), target_decay), rate).args[0]

# Generate weights
weights = []
for i in range(5723673):
    # How to handle data BEYOND decay_obs_window
    if i > decay_obs_window and target_decay == 0:
        # Record a weight of zero
        weights.append(0)
    elif i > decay_obs_window and target_decay > 0:
        # Record the final target weight
        weights.append(decayed_weight)
    # How to handle data WITHIN decay_obs_window
    else:
        # Calculate the new slightly decayed weight
        decayed_weight = 1 - (decay_rate * i)
        weights.append(decayed_weight)

weights[0:10]

I wrote this list comprehension with the hope of improving the execution time. While it works perfectly, it does not yield any appreciable runtime improvement over the for loop 😞:

weights = [0 if i > decay_obs_window and target_decay == 0 else decayed_weight if i > decay_obs_window and target_decay > 0 else (decayed_weight := 1 - (decay_rate * i)) for i in range(len(weights_df))]

I'm interested in any approaches that would help speed this up. Thank you 🙏!


FINAL SOLUTION:

This was the final solution that I settled on. On my machine, the wall time to execute the entire thing is only 425 ms. It's a slightly modified version of Aaron's proposed solution.

import numpy as np
from sympy.solvers.solveset import solveset_real
from sympy import Symbol, Eq

# Define variables
initial_value = 1
rate = Symbol('r')
decay_obs_window = 1480346
target_decay = .15

# Instantiate weights array
weights = np.zeros(5723673)

# Solver to calculate decay rate
decay_rate = solveset_real(Eq((initial_value - rate * decay_obs_window), target_decay), rate).args[0]

# Fix a bug where numpy doesn't like sympy floats :(
decay_rate = float(decay_rate)

# How to weight observations WITHIN decay_obs_window
weights[:decay_obs_window + 1] = 1 - np.arange(decay_obs_window + 1) * decay_rate

# How to weight observations BEYOND decay_obs_window
weights[decay_obs_window + 1 : 5723673] = target_decay

weights

Upvotes: 1

Views: 1416

Answers (3)

Turksarama
Turksarama

Reputation: 1136

I have what I believe to be an absolutely optimal method here:

# Imports
from sympy.solvers.solveset import solveset_real
from sympy import Symbol, Eq

# Define variables
initial_value = 1
rate = Symbol('r')
decay_obs_window = 1480346
target_decay = .15

# Solver to calculate decay rate
decay_rate = solveset_real(Eq((initial_value - rate * decay_obs_window), target_decay), rate).args[0]

# Generate weights
wLength = 5723673
weights = [1 - (decay_rate * i) for i in range(decay_obs_window + 1)]
extend_length = wLength - decay_obs_window - 1
if target_decay == 0:
    weights.extend(0 for _ in range(extend_length))
elif target_decay > 0:
    decayed_weight = weights[-1]
    weights.extend(decayed_weight for _ in range(extend_length))

This brings all the branching logic out of the loop, so it's only calculated once instead of ~ 1.5 million times.

That said, this still represents nearly no improvement in speed over what you already have. The fact of the matter is that most of your time is spent calculating 1 - (decay_rate * i), and there's nothing you can do in python to speed this up.

If you really need more performance you're probably at the point of needing to figure out how to call a C (or Rust) library.

Numpy is perfect for this. We can use the fromfunction method to create an array. First import the function:

from numpy import fromfunction

Then replace

weights = [1 - (decay_rate * i) for i in range(decay_obs_window + 1)]

with

weights = fromfunction(lambda i: 1 - (decay_rate * i), (decay_obs_window + 1, )).tolist()

This is likely to represent the fastest you can do this in Python.

Upvotes: 0

Aaron
Aaron

Reputation: 11075

TLDR; None of the variables you test against in your if statements ever change during the loop, so you can easily kick the conditional logic out of the loop, and decide beforehand. I also am a huge proponent of numpy and vectorization.

Looking at the logic there aren't too many possible outcomes of what weights ends up looking like. As RufusVS mentioned, you can separate out the first section where no additional logic is being calculated. It is also a simple linear function, so why not compute it with numpy which is great for linear algebra:

import numpy as np
weights = np.zeros(5723673)
#Fix a bug where numpy doesn't like sympy floats :(
decay_rate = float(decay_rate)
weights[:decay_obs_window + 1] = 1 - np.arange(decay_obs_window + 1) * decay_rate

Then you can decide what to do with the remaining values based on the value of target_decay outside of any loops because it never changes:

if target_decay == 0:
    pass #weights array started out filled with 0's so we don't need to do anything
elif target_decay > 0:
    #fill the rest of the array with the last value of the window
    weights[decay_obs_window + 1 : 5723673] = weights[decay_obs_window + 1]
    pass
else: #target_decay < 0:
    #continue calculating the linear function
    weights[decay_obs_window + 1 : 5723673] = 1 - np.arange(decay_obs_window + 1, 5723673) * decay_rate

Upvotes: 1

RufusVS
RufusVS

Reputation: 4137

By breaking it into two loops, you eliminate a lot of comparison to the break point:

# Imports
from sympy.solvers.solveset import solveset_real
from sympy import Symbol, Eq

# Define variables
initial_value = 1
rate = Symbol('r')
decay_obs_window = 1480346
target_decay = .15

# Solver to calculate decay rate
decay_rate = solveset_real(Eq((initial_value - rate * decay_obs_window), target_decay), rate).args[0]

# Generate weights
weights = []

for i in range(decay_obs_window+1):
    # Calculate the new slightly decayed weight
    decayed_weight = 1 - (decay_rate * i)
    weights.append(decayed_weight)

for i in range(decay_obs_window+1, 5723673):
    # How to handle data BEYOND decay_obs_window
    if target_decay == 0:
        weights.append(0)
    elif target_decay > 0:
        # Record the final target weight
        weights.append(decayed_weight)
    else:
        # Calculate the new slightly decayed weight
        decayed_weight = 1 - (decay_rate * i)
        weights.append(decayed_weight)

weights[0:10]

Modified to include @MarkSouls comment, and my own further observation:

# Imports
from sympy.solvers.solveset import solveset_real
from sympy import Symbol, Eq

# Define variables
initial_value = 1
rate = Symbol('r')
decay_obs_window = 1480346
target_decay = .15

# Solver to calculate decay rate
decay_rate = solveset_real(Eq((initial_value - rate * decay_obs_window), target_decay), rate).args[0]

TOTAL_ENTRIES = 5723673 
# Generate weights
weights = [0]* TOTAL_ENTRIES

for i in range(decay_obs_window+1):
    # Calculate the new slightly decayed weight
    decayed_weight = 1 - (decay_rate * i)
    weights[i]=decayed_weight

if target_decay == 0:
    pass
elif target_decay > 0:
    for i in range(decay_obs_window+1, TOTAL_ENTRIES):
        # Record the final target weight
        weights[i]=decayed_weight
else:
    for i in range(decay_obs_window+1, TOTAL_ENTRIES):
        decayed_weight = 1 - (decay_rate * i)
        weights[i]=decayed_weight

weights[0:10]

Upvotes: 0

Related Questions