Reputation: 815
I have a pandas df (5568, 108) where the column of interest is df.Age, which has some NaNs (303). I want to keep the NaNs but drop some of the outliers. df.drop(df[df.Age<18]) and df.drop(df[df.Age>90]).
I tried
for index, rows in df.iterrows():
if (df.loc[index, 'Age'] > 0.0 & df.loc[index, 'Age'] < 18.0):
df.drop(df.iloc[index])
elif (df.loc[index, 'Age'] > 0.0 & df.loc[index, 'Age'] > 90.0):
df.drop(df.iloc[index])
else:
continue
But this results in
TypeError: unsupported operand type(s) for &: 'float' and 'numpy.float64'
Any thoughts on how I can achieve this?
Upvotes: 1
Views: 98
Reputation: 862406
I think you need boolean indexing
with between
and isnull
for filtering, what is most common as drop
by conditions:
df = pd.DataFrame({'Age':[10,20,90,88,np.nan], 'a': [10,20,40,50,90]})
print (df)
Age a
0 10.0 10
1 20.0 20
2 90.0 40
3 88.0 50
4 NaN 90
print ((df['Age'].between(18,90, inclusive=False)) | (df['Age'].isnull()))
0 False
1 True
2 False
3 True
4 True
Name: Age, dtype: bool
df = df[(df['Age'].between(18,90, inclusive=False)) | (df['Age'].isnull())]
print (df)
Age a
1 20.0 20
3 88.0 50
4 NaN 90
Upvotes: 1
Reputation: 8829
There is an operator precedence issue. Wrap parentheses. (df.loc[index, 'Age'] > 0.0) & ...
, etc. The &
is evaluated before the >
otherwise, leading to the expression 0.0 & df.loc[index, 'Age']
.
Upvotes: 1