Reputation: 1527
Yes this question has been asked many times! No, I have still not been able to figure out how to run this boolean filter without generating the Pandas SettingWithCopyWarning warning.
for x in range(len(df_A)):
df_C = df_A.loc[(df_A['age'] >= df_B['age_limits'].iloc[x][0]) &
(df_A['age'] <= df_B['age_limits'].iloc[x][1])]
df_D['count'].iloc[x] = len(df_C) # triggers warning
I've tried:
I know I can suppress the warning, but I don't want to do that.
What am I missing? I know it's probably something obvious.
Many thanks!
Upvotes: 0
Views: 52
Reputation: 29635
For more details on why you got SettingWithCopyWarning, I would suggest you to read this answer. It is mostly because selecting the columns df_D['count']
and then using iloc[x]
does a "chained assignment" that is flagged this way.
To prevent it, you can get the position of the column you want in df_D
and then use iloc
for both the row and the column in the loop for
:
pos_col_D = df_D.columns.get_loc['count']
for x in range(len(df_A)):
df_C = df_A.loc[(df_A['age'] >= df_B['age_limits'].iloc[x][0]) &
(df_A['age'] <= df_B['age_limits'].iloc[x][1])]
df_D.iloc[x,pos_col_D ] = len(df_C) #no more warning
Also, because you compare all the values of df_A.age
with the bounds of df_B.age_limits
, I think you could improve the speed of your code using numpy.ufunc.outer
, with ufunc
being greater_equal
and less_egal
, and then sum
over the axis=0.
#Setup
import numpy as np
import pandas as pd
df_A = pd.DataFrame({'age': [12,25,32]})
df_B = pd.DataFrame({'age_limits':[[3,99], [20,45], [15,30]]})
#your result
for x in range(len(df_A)):
df_C = df_A.loc[(df_A['age'] >= df_B['age_limits'].iloc[x][0]) &
(df_A['age'] <= df_B['age_limits'].iloc[x][1])]
print (len(df_C))
3
2
1
#with numpy
print ( ( np.greater_equal.outer(df_A.age, df_B.age_limits.str[0])
& np.less_equal.outer(df_A.age, df_B.age_limits.str[1]))
.sum(0) )
array([3, 2, 1])
so you can assign the previous line of code directly in df_D['count']
without loop for
. Hope this work for you
Upvotes: 1