Reputation: 133
How to delete observation from data frame in python. For example, I have data frame with variables a, b, c in it, and I want to delete observation if variable a is missing, or variable c is equal to zero.
Upvotes: 0
Views: 3766
Reputation: 879093
You could build a boolean mask using isnull
:
mask = (df['a'].isnull()) | (df['c'] == 0)
and then select the desired rows with:
df = df.loc[~mask]
~mask
is the boolean inverse of mask
, so df.loc[~mask]
selects rows where a
is not null and c
is not 0.
For example,
import numpy as np
import pandas as pd
arr = np.arange(15, dtype='float').reshape(5,3) % 4
arr[arr > 2] = np.nan
df = pd.DataFrame(arr, columns=list('abc'))
# a b c
# 0 0 1 2
# 1 NaN 0 1
# 2 2 NaN 0
# 3 1 2 NaN
# 4 0 1 2
mask = (df['a'].isnull()) | (df['c'] == 0)
df = df.loc[~mask]
yields
a b c
0 0 1 2
3 1 2 NaN
4 0 1 2
Upvotes: 2
Reputation: 11938
Let's say your DataFrame looks like this:
In [2]: data = pd.DataFrame({
...: 'a': [1,2,3,pd.np.nan,5],
...: 'b': [3,4,pd.np.nan,5,6],
...: 'c': [0,1,2,3,4],
...: })
In [3]: data
Out[3]:
a b c
0 1 3 0
1 2 4 1
2 3 NaN 2
3 NaN 5 3
4 5 6 4
To delete rows with missing observations, use:
In [5]: data.dropna()
Out[5]:
a b c
0 1 3 0
1 2 4 1
4 5 6 4
To delete rows where only column 'a' has missing observations, use:
In [6]: data.dropna(subset=['a'])
Out[6]:
a b c
0 1 3 0
1 2 4 1
2 3 NaN 2
4 5 6 4
To delete rows that have either missing observations or zeros, use:
In [18]: data[data.all(axis=1)].dropna()
Out[18]:
a b c
1 2 4 1
4 5 6 4
Upvotes: 0