Reputation: 867
I have multiple datasets with different number of rows and same number of columns. I would like to find Nan values in each column for example consider these two datasets:
dataset1 : dataset2:
a b a b
1 10 2 11
2 9 3 12
3 8 4 13
4 nan nan 14
5 nan nan 15
6 nan nan 16
I want to find nan values in two datasets a and b : if it occurs in column b then remove all the rows that have nan values. and if it occurs in column a then fill that values with 0.
this is my snippet code:
a=pd.notnull(data['a'].values.any())
b= pd.notnull((data['b'].values.any()))
if a:
data = data.dropna(subset=['a'])
if b:
data[['a']] = data[['a']].fillna(value=0)
which does not work properly.
Upvotes: 3
Views: 2246
Reputation: 323226
Pass your condition to a dict
df=df.fillna({'a':0,'b':np.nan}).dropna()
You do not need 'b' here
df=df.fillna({'a':0}).dropna()
EDIT :
df.fillna({'a':0}).dropna()
Out[1319]:
a b
0 2.0 11
1 3.0 12
2 4.0 13
3 0.0 14
4 0.0 15
5 0.0 16
Upvotes: 2
Reputation: 38415
You just need fillna and dropna without control flow
data = data.dropna(subset=['b']).fillna(0)
Upvotes: 4