Reputation: 61

R for-loop vs Python for-loop Performance

There already is some discussion on this topic but they don't quite address my question. Sorry in advance if they do and I didn't realize.

Here are two simple for-loop setups in R and python -

R for-loop (took 3.41s on my computer):

datafr  <- matrix(0,nrow=24*365,ncol=15)
matrix3d  <- array(0,dim=c(24*365,12,7))

#================
start_time <- Sys.time()
for (p in 1:150) {
  for (m in 1:2) {
    l  <- rep(0.7*runif(365),each=24)
    a  <- rep(0.7*runif(365),each=24)
    pp <- 1+floor(15*runif(7))
    for (j in 1:7) {
      bun     <- datafr[,pp[j]]*a
      for (h in 2:(24*365)) {
        matrix3d[h,m,j] <- matrix3d[h-1,m,j]*l[h] + bun[h]
      }  
    }
  }
}
Sys.time() - start_time
#================
#took 3.41s on my computer

And here's the same code in Python (#took 17.87s on my computer):

import numpy as np
import time
import pandas as pd

datafr= pd.DataFrame(0, index=range(24*365),columns=range(15))
matrix3d = np.zeros((24*365,12,7))

#=============
start_time = time.time()
for p in range(150):
    for m in range(2):
        l = np.repeat(0.7*np.random.random(365),24)
        a = np.repeat(0.7*np.random.random(365),24)
        pp = 1+np.floor(15*np.random.random(7))
        for j in range(7):
            bun = np.asarray(datafr.iloc[:,int(pp[j])-1],dtype=np.float32)*a
            for h in range(1,(24*365)):
                matrix3d[h,m,j] = matrix3d[h-1,m,j]*l[h]+bun[h] #bottleneck
round(time.time() - start_time,2)
#================
#took 17.87s on my computer

R is over 5 times faster than Python. Is this to be expected? I saw that Python's for-loop is faster than R's, unless you use R's lapply in which case R beats Python if the number of steps is greater than 1000 (https://datascienceplus.com/loops-in-r-and-python-who-is-faster/), but that is not what I see here (I'm not using lapply). Can the Python script be improved in a way that doesn't use decorators or magic functions or generators etc? I'm simply curious. Thanks

Upvotes: 1

Answers (2)

Miłosz Bertman

Reputation: 1

If possible never use for loops. Vectorised operations are superb over ~50 elements almost for every task.

Benchmark done on MacBook M2

import numpy as np
import time
import pandas as pd

df = pd.DataFrame(0, index=range(24 * 365), columns=range(15))
matrix3d = np.zeros((24 * 365, 12, 7))

#=============
start_time = time.time()
for p in range(150):
    for m in range(2):
        l = np.repeat(0.7 * np.random.random(365), 24)
        a = np.repeat(0.7 * np.random.random(365), 24)
        pp = 1 + np.floor(15 * np.random.random(7))
        for j in range(7):
            bun = np.asarray(df.iloc[:, int(pp[j]) - 1], dtype=np.float32) * a
            for h in range(1, (24 * 365)):
                matrix3d[h, m, j] = matrix3d[h - 1, m, j] * l[h] + bun[h]  #bottleneck
round(time.time() - start_time, 2)
#>>>>>>>>>>> Result 6.71s

Now the improved version:

rows = 24 * 365
cols = 15
datafr = pd.DataFrame(0, index=np.arange(rows), columns=np.arange(cols))
matrix3d = np.zeros((rows, 12, 7))

# Timing start
start_time = time.time()

# Precompute random factors that don't need to be inside the loop
scale_factors = 0.7 * np.random.random((150, 365))
offsets = 1 + np.floor(15 * np.random.random((150, 7)))

# Main computation
for p in range(150):
    # Expand daily values to hourly values
    daily_growth = np.repeat(scale_factors[p], 24)
    for m in range(2):
        for j in range(7):
            column = int(offsets[p, j]) - 1
            # Apply scaling from corresponding data frame column
            bun = np.asarray(datafr.iloc[:, column], dtype=np.float32) * daily_growth
            # Update the 3D matrix
            np.multiply(matrix3d[:-1, m, j], daily_growth[1:], out=matrix3d[1:, m, j])
            matrix3d[1:, m, j] += bun[1:]

# Report elapsed time
elapsed_time = round(time.time() - start_time, 2)
print(elapsed_time)
#>>>>>>>>>>> Result: 0.16s

Upvotes: 0

Vikram

Reputation: 177

R loops used to be slow during 2014 or 15. They aren't slow anymore software's and programming language evolve over time and things are never true forever. JS is a perfect example of this.

R for loops are not slow and you can use them anytime you want however the garbage collector of R is slow and you shouldn't grow a vector inside a loop which copies it multiple time. If you avoid that part you are almost always in safe hands

And you could also try set method from data.table if you need more speed from loop or parallelize it

Upvotes: 4

R for-loop vs Python for-loop Performance

Answers (2)

Related Questions