Tanmoy
Tanmoy

Reputation: 815

Drop rows based on column value keeping NaNs

I have a pandas df (5568, 108) where the column of interest is df.Age, which has some NaNs (303). I want to keep the NaNs but drop some of the outliers. df.drop(df[df.Age<18]) and df.drop(df[df.Age>90]).

I tried

for index, rows in df.iterrows():
if (df.loc[index, 'Age'] > 0.0 & df.loc[index, 'Age'] < 18.0):
    df.drop(df.iloc[index])
elif (df.loc[index, 'Age'] > 0.0 & df.loc[index, 'Age'] > 90.0):
    df.drop(df.iloc[index])
else:
    continue

But this results in

TypeError: unsupported operand type(s) for &: 'float' and 'numpy.float64'

Any thoughts on how I can achieve this?

Upvotes: 1

Views: 98

Answers (2)

jezrael
jezrael

Reputation: 862406

I think you need boolean indexing with between and isnull for filtering, what is most common as drop by conditions:

df = pd.DataFrame({'Age':[10,20,90,88,np.nan], 'a': [10,20,40,50,90]})
print (df)
    Age   a
0  10.0  10
1  20.0  20
2  90.0  40
3  88.0  50
4   NaN  90

print ((df['Age'].between(18,90, inclusive=False)) | (df['Age'].isnull()))
0    False
1     True
2    False
3     True
4     True
Name: Age, dtype: bool

df = df[(df['Age'].between(18,90, inclusive=False)) | (df['Age'].isnull())]    
print (df)
    Age   a
1  20.0  20
3  88.0  50
4   NaN  90

Upvotes: 1

Arya McCarthy
Arya McCarthy

Reputation: 8829

There is an operator precedence issue. Wrap parentheses. (df.loc[index, 'Age'] > 0.0) & ..., etc. The & is evaluated before the > otherwise, leading to the expression 0.0 & df.loc[index, 'Age'].

Upvotes: 1

Related Questions