Reputation: 55
I am using two for loops inside each other to calculate a value using combinations of elements in a dataframe list. the list consists of large number of dataframes and using two for loops takes considerable amount of time.
Is there a way i can do the operation faster?
the functions I refer with dummy names are the ones where I calculate the results.
My code looks like this:
conf_list = []
for tr in range(len(trajectories)):
df_1 = trajectories[tr]
if len(df_1) == 0:
continue
for tt in range(len(trajectories)):
df_2 = trajectories[tt]
if len(df_2) == 0:
continue
if df_1.equals(df_2) or df_1['time'].iloc[0] > df_2['time'].iloc[-1] or df_2['time'].iloc[0] > df_1['time'].iloc[-1]:
continue
df_temp = cartesian_product_basic(df_1,df_2)
flg, df_temp = another_function(df_temp)
if flg == 0:
continue
flg_h = some_other_function(df_temp)
if flg_h == 1:
conf_list.append(1)
My input list consist of around 5000 dataframes looking like (having several hundreds of rows)
id | x | y | z | time |
---|---|---|---|---|
1 | 5 | 7 | 2 | 5 |
and what i do is I get the cartesian product with combinations of two dataframes and for each couple I calculate another value 'c'. If this value c meets a condition then I add an element to my c_list so that I can get the final number of couples meeting the requirement.
For further info;
a_function(df_1, df_2) is a function getting the cartesian product of two dataframes.
another_function looks like this:
def another_function(df_temp):
df_temp['z_dif'] = nwh((df_temp['time_x'] == df_temp['time_y'])
, abs(df_temp['z_x']- df_temp['z_y']) , np.nan)
df_temp = df_temp.dropna()
df_temp['vert_conf'] = nwh((df_temp['z_dif'] >= 1000)
, np.nan , 1)
df_temp = df_temp.dropna()
if len(df_temp) == 0:
flg = 0
else:
flg = 1
return flg, df_temp
and some_other_function looks like this:
def some_other_function(df_temp):
df_temp['x_dif'] = df_temp['x_x']*df_temp['x_y']
df_temp['y_dif'] = df_temp['y_x']*df_temp['y_y']
df_temp['hor_dif'] = hypot(df_temp['x_dif'], df_temp['y_dif'])
df_temp['conf'] = np.where((df_temp['hor_dif']<=5)
, 1 , np.nan)
if df_temp['conf'].sum()>0:
flg_h = 1
return flg_h
Upvotes: 0
Views: 1423
Reputation: 1150
The following are the way to make your code run faster:
for-loop
use list comprehension.map
, filter
, sum ect, this would make your code faster.Import datetime
A=datetime.datetime.now() #dont use this
From datetime.datetime import now as timenow
A=timenow()# use this
True
"if-else
to check a Boolean value, avoid using assignment operator.# Instead of Below approach
if a==1:
print('a is 1')
else:
print('a is 0')
# Try this approach
if a:
print('a is 1')
else:
print('a is 0')
# This would help as a portion of time is reduce which was used in check the 2 values.
Usefull references:
Upvotes: 1