Reputation: 41
I am working on code to perform a transient simulation calling a function for each second of the day and eventually whole year. I thought I found an opportunity to speed up the code by passing a vector of inputs instead of calling a for loop, however, my code runs slower when I do this and I don't understand why.
I would hope for the vector to be only slightly slower than calling the for loop one time to achieve my targeted speed up.
Can you please help explain and/or solve this issue? I have shown three ways in my sample below which is a simplification of the larger program.
Inside the function is the Eg variable which is currently set to zero. If I do Eg=float(0)
or Eg=np.array([0,0,0,0,0])
the code runs slower and I assume this is the same issue as the larger question.
The results from the code below is:
Execution time for numpy vector is 716.225 ms
Execution time for 'for-loop' 6 calls is 389.87 ms
Execution time for numpy float32 'for-loop' is 3906.9069999999997 ms
Code sample:
from datetime import datetime, timedelta
import numpy as np
def Q_Walls( A, R1, R2, R3, R4, m, cp, T_inf_inside, T_inf_outside, dt, T_p):
Ein = (T_inf_outside - T_p) * A / (R2/2 + R3 + R4) # convection and conduction only
Eout = (T_p - T_inf_inside) * A / (R1 + R2/2) # convection and conduction only
Eg = 0
Enet = Eg + Ein - Eout
T_p1 = (Enet * dt / (m * cp) + T_p) # average bulk temperature of wall after time dt
T2_surf = (T_p - Eout * R2/2 / A)
return T_p1, Eout, T2_surf
def Q_Walls_vect( A, R1, R2, R3, R4, m, cp, T_inf_inside, T_inf_outside, dt, T_p):
Ein = (T_inf_outside - T_p) * A / (R2/2 + R3 + R4) # convection and conduction only
Eout = (T_p - T_inf_inside) * A / (R1 + R2/2) # convection and conduction only
Eg = 0 #np.array([0,0,0,0,0], 'float64')
Enet = Eg + Ein - Eout
T_p1 = (Enet * dt / (m * cp) + T_p) # average bulk temperature of wall after time dt
T2_surf = (T_p - Eout * R2/2 / A)
return T_p1, Eout, T2_surf
A= R1= R2= R3= R4= m= cp= np.array([1,1,1,1,1], 'float32')
dt= np.array([1,1,1,1,1], 'float32')
T_inf_inside = np.array([250,250,250,250,250], 'float32')
T_inf_outside = np.array([250.2,250.2,250.2,250.2,250.2], 'float32')
T_p_wall = np.array([250.1,250.1,250.1,250.1,250.1], 'float32')
t_max =87000
begin_time = datetime.now()
for x in np.arange(t_max):
T_p_wall, Enet_wall, Tinside_surf = Q_Walls_vect(A, R1, R2, R3, R4, m, cp, T_inf_inside, T_inf_outside, dt, T_p_wall)
end_time = (datetime.now() - begin_time)
print(f"Execution time for numpy vector is {end_time.total_seconds()*1000} ms")
A= R1= R2= R3= R4= m= cp= float(1.1)
dt= float(1)
T_inf_inside = float(250.01)
T_p_wall = float(250.1)
T_inf_outside = float(250.2)
begin_time = datetime.now()
for x in np.arange(t_max):
for j in range(6):
T_p_wall, Enet_wall, Tinside_surf = Q_Walls(A, R1, R2, R3, R4, m, cp, T_inf_inside, T_inf_outside, dt, T_p_wall)
end_time = (datetime.now() - begin_time)
print(f"Execution time for 'for-loop' 6 calls is {end_time.total_seconds()*1000} ms")
A= R1= R2= R3= R4= m= cp= np.float32(1.1)
dt= 1
T_inf_inside = np.float32(250.01)
T_p_wall = np.float32(250.1)
T_inf_outside = np.float32(250.2)
begin_time = datetime.now()
for x in np.arange(t_max):
for j in range(6):
T_p_wall, Enet_wall, Tinside_surf = Q_Walls(A, R1, R2, R3, R4, m, cp, T_inf_inside, T_inf_outside, dt, T_p_wall)
end_time = (datetime.now() - begin_time)
print(f"Execution time for numpy float32 'for-loop' is {end_time.total_seconds()*1000} ms")
Upvotes: 4
Views: 443
Reputation: 50946
There are multiple issues occurring in the code:
[int] BIN_OP [float32]
and [float32] BIN_OP [float64]
and with a reverse order. This causes more temporary arrays to be created and several implicit conversions to be done, making the code significantly slower.The second point can be fixed using the following example code:
f32_const_0 = np.float32(0)
f32_const_2 = np.float32(2)
def Q_Walls_float32( A, R1, R2, R3, R4, m, cp, T_inf_inside, T_inf_outside, dt, T_p):
Ein = (T_inf_outside - T_p) * A / (R2/f32_const_2 + R3 + R4) # convection and conduction only
Eout = (T_p - T_inf_inside) * A / (R1 + R2/f32_const_2) # convection and conduction only
Eg = f32_const_0
Enet = Eg + Ein - Eout
T_p1 = (Enet * dt / (m * cp) + T_p) # average bulk temperature of wall after time dt
T2_surf = (T_p - Eout * R2/f32_const_2 / A)
return T_p1, Eout, T2_surf
You can mitigate the cost with Numba (or Cython), but the best is not to use Numpy array for only few elements, or actually to directly do the computation element-wise in Numba so that no a lot of temporary array are created.
Here is an example of Numba code:
from datetime import datetime, timedelta
import numpy as np
import numba as nb
A= R1= R2= R3= R4= m= cp= float(1.1)
dt= float(1)
T_inf_inside = float(250.01)
T_p_wall = float(250.1)
T_inf_outside = float(250.2)
@nb.njit(nb.types.UniTuple(nb.float64,3)(nb.float64, nb.float64, nb.float64, nb.float64, nb.float64, nb.float64, nb.float64,
nb.float64, nb.float64, nb.float64, nb.float64))
def Q_Walls( A, R1, R2, R3, R4, m, cp, T_inf_inside, T_inf_outside, dt, T_p):
Ein = (T_inf_outside - T_p) * A / (R2/2 + R3 + R4) # convection and conduction only
Eout = (T_p - T_inf_inside) * A / (R1 + R2/2) # convection and conduction only
Eg = 0
Enet = Eg + Ein - Eout
T_p1 = (Enet * dt / (m * cp) + T_p) # average bulk temperature of wall after time dt
T2_surf = (T_p - Eout * R2/2 / A)
return (T_p1, Eout, T2_surf)
@nb.njit(nb.types.UniTuple(nb.float64,3)(nb.float64, nb.float64, nb.float64, nb.float64, nb.float64, nb.float64, nb.float64,
nb.float64, nb.float64, nb.float64, nb.float64))
def compute_with_numba(A, R1, R2, R3, R4, m, cp, T_inf_inside, T_inf_outside, dt, T_p_wall):
for x in np.arange(t_max):
for j in range(6):
T_p_wall, Enet_wall, Tinside_surf = Q_Walls(A, R1, R2, R3, R4, m, cp, T_inf_inside, T_inf_outside, dt, T_p_wall)
return (T_p_wall, Enet_wall, Tinside_surf)
begin_time = datetime.now()
T_p_wall, Enet_wall, Tinside_surf = compute_with_numba(A, R1, R2, R3, R4, m, cp, T_inf_inside, T_inf_outside, dt, T_p_wall)
end_time = (datetime.now() - begin_time)
print(f"Execution time for 'for-loop' 6 calls is {end_time.total_seconds()*1000} ms")
Here are timing results on my machine:
Initial execution:
Execution time for numpy vector is 758.232 ms
Execution time for 'for-loop' 6 calls is 256.093 ms
Execution time for numpy float32 'for-loop' is 3768.253 ms
----------
Fixed execution (Q_Walls_float32):
Execution time for numpy float32 'for-loop' is 839.016 ms
----------
With Numba (compute_with_numba):
Execution time for 'for-loop' 6 calls is 6.311 ms
Upvotes: 2