Reputation: 101
I need to handle the missing values by dropping a particular row or column. I know how to drop a column or row with missing values, but what I need is a little bit different. I need to drop the rows if and only if two particular columns have missing values.
Here is the example I have tried:
# Basic libraries
import os
import pandas as pd
import numpy as np
data_dict = {'First':[100, 90, np.nan, np.nan],
'Second': [30, 45, 56, np.nan],
'Third':[np.nan, 40, 80, np.nan],
'Forth': [30,40,50,np.nan]}
df1 = pd.DataFrame(data_dict)
df1.dropna(subset=['First','Second'],inplace=True)
df1
As shown, I am trying to drop the rows where First==Nan
and Second==Nan
. In other words, I need both conditions to be correct. So, I need just the last row to be dropped, where both values are Nan. So, row 2 will not be deleted since one of the conditions is satisfied. Unfortunately, the command I've used drops the rows if one of the conditions is satisfied.
Upvotes: 3
Views: 1414
Reputation: 862921
Use how='all'
parameter in DataFrame.dropna
, for you not working because default value is how='any'
:
how: {'any', 'all'}, default 'any'
Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.
'any' : If any NA values are present, drop that row or column.
'all' : If all values are NA, drop that row or column.
df1.dropna(subset=['First','Second'],inplace=True, how='all')
print (df1)
First Second Third Forth
0 100.0 30.0 NaN 30.0
1 90.0 45.0 40.0 40.0
2 NaN 56.0 80.0 50.0
EDIT: For pass variable list use:
list_columns = ['First','Second','Third','Forth']
df1.dropna(subset=list_columns,inplace=True, how='all')
Upvotes: 4