yasin mohammed
yasin mohammed

Reputation: 471

Select data when specific columns have null value in pandas

I have a dataframe where there are 2 date fields I want to filter and see rows when any one of the date field is null.

ID          Date1       Date2
58844880    04/11/16    NaN
59745846    04/12/16    04/14/16
59743311    04/13/16    NaN
59745848    04/14/16    04/11/16
59598413    NaN         NaN
59745921    04/14/16    04/14/16
59561199    04/15/16    04/15/16
NaN         04/16/16    04/16/16
59561198    NaN         04/17/16

It should look like below

ID          Date1       Date2
58844880    04/11/16    NaN
59743311    04/13/16    NaN
59598413    NaN         NaN
59561198    NaN         04/17/16

Tried the code df = (df['Date1'].isnull() | df['Date1'].isnull())

Upvotes: 14

Views: 31507

Answers (3)

markling
markling

Reputation: 1384

Quickly see if either column has any null values

df.isnull().any()

Count rows that have any null values

df.isnull().sum()

Get rows with null values

(1) Create truth table of null values (i.e. create dataframe with True/False in each column/cell, according to whether it has null value)

truth_table = df.isnull()

(2) Create truth table that shows conclusively which rows have any null values

conclusive_truth_table = truth_table.any(axis='columns')

(3) isolate/show rows that have any null values

df[conclusive_truth_table]

(1)-(3) put it all together

df[df.isnull().any(axis='columns')]

Alternatively

Isolate rows that have null values in any specified column

df.loc[:,['Date1','Date2']].isnull().any(axis='columns')

Isolate rows that have null values in BOTH specified columns

df[ df.loc[ :,['Date1','Date2'] ].isnull().sum(axis=1) == 2]

Upvotes: 5

jezrael
jezrael

Reputation: 862631

Use boolean indexing:

mask = df['Date1'].isnull() | df['Date2'].isnull()
print (df[mask])
           ID     Date1     Date2
0  58844880.0  04/11/16       NaN
2  59743311.0  04/13/16       NaN
4  59598413.0       NaN       NaN
8  59561198.0       NaN  04/17/16

Timings:

#[900000 rows x 3 columns]
df = pd.concat([df]*100000).reset_index(drop=True)

In [12]: %timeit (df[df['Date1'].isnull() | df['Date2'].isnull()])
10 loops, best of 3: 89.3 ms per loop

In [13]: %timeit (df[df.filter(like='Date').isnull().any(1)])
10 loops, best of 3: 146 ms per loop

Upvotes: 12

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210842

try this:

In [7]: df[df.filter(like='Date').isnull().any(1)]
Out[7]:
           ID     Date1     Date2
0  58844880.0  04/11/16       NaN
2  59743311.0  04/13/16       NaN
4  59598413.0       NaN       NaN
8  59561198.0       NaN  04/17/16

Upvotes: 2

Related Questions