Lodyk Vovchak
Lodyk Vovchak

Reputation: 133

Drop observations from the data frame in python

How to delete observation from data frame in python. For example, I have data frame with variables a, b, c in it, and I want to delete observation if variable a is missing, or variable c is equal to zero.

Upvotes: 0

Views: 3766

Answers (2)

unutbu
unutbu

Reputation: 879093

You could build a boolean mask using isnull:

mask = (df['a'].isnull()) | (df['c'] == 0)

and then select the desired rows with:

df = df.loc[~mask]

~mask is the boolean inverse of mask, so df.loc[~mask] selects rows where a is not null and c is not 0.


For example,

import numpy as np
import pandas as pd

arr = np.arange(15, dtype='float').reshape(5,3) % 4
arr[arr > 2] = np.nan

df = pd.DataFrame(arr, columns=list('abc'))
#     a   b   c
# 0   0   1   2
# 1 NaN   0   1
# 2   2 NaN   0
# 3   1   2 NaN
# 4   0   1   2

mask = (df['a'].isnull()) | (df['c'] == 0)
df = df.loc[~mask]

yields

   a  b   c
0  0  1   2
3  1  2 NaN
4  0  1   2

Upvotes: 2

S Anand
S Anand

Reputation: 11938

Let's say your DataFrame looks like this:

In [2]: data = pd.DataFrame({
   ...:     'a': [1,2,3,pd.np.nan,5],
   ...:     'b': [3,4,pd.np.nan,5,6],
   ...:     'c': [0,1,2,3,4],
   ...: })

In [3]: data
Out[3]:
    a   b  c
0   1   3  0
1   2   4  1
2   3 NaN  2
3 NaN   5  3
4   5   6  4

To delete rows with missing observations, use:

In [5]: data.dropna()
Out[5]:
   a  b  c
0  1  3  0
1  2  4  1
4  5  6  4

To delete rows where only column 'a' has missing observations, use:

In [6]: data.dropna(subset=['a'])
Out[6]:
   a   b  c
0  1   3  0
1  2   4  1
2  3 NaN  2
4  5   6  4

To delete rows that have either missing observations or zeros, use:

In [18]: data[data.all(axis=1)].dropna()
Out[18]:
   a  b  c
1  2  4  1
4  5  6  4

Upvotes: 0

Related Questions