xyhuang
xyhuang

Reputation: 424

why i cant use np.isnan to filter dataframe?

I have some dataframes, which contain a lot of nan.

i want to make a mask by the frist dataframe, then only keep those columns which contains no np.nan in the first datafame.

let me give an example:

In [69]: df = pd.DataFrame(np.reshape(range(25), (5,5)))

In [70]: df
Out[70]: 
    0   1   2   3   4
0   0   1   2   3   4
1   5   6   7   8   9
2  10  11  12  13  14
3  15  16  17  18  19
4  20  21  22  23  24

In [71]: df[5] = np.nan

In [72]: df
Out[72]: 
    0   1   2   3   4   5
0   0   1   2   3   4 NaN
1   5   6   7   8   9 NaN
2  10  11  12  13  14 NaN
3  15  16  17  18  19 NaN
4  20  21  22  23  24 NaN
### the following is the mask
In [73]: np.isnan(df)
Out[73]: 
       0      1      2      3      4     5
0  False  False  False  False  False  True
1  False  False  False  False  False  True
2  False  False  False  False  False  True
3  False  False  False  False  False  True
4  False  False  False  False  False  True

In [74]: df[~np.isnan(df)]
Out[74]: 
    0   1   2   3   4   5
0   0   1   2   3   4 NaN
1   5   6   7   8   9 NaN
2  10  11  12  13  14 NaN
3  15  16  17  18  19 NaN
4  20  21  22  23  24 NaN

you can see, i use np.isnan to create a mask. then use df[mask] to filter.

but it looks failed, the output still contains column5. is there anything i used wrongly?

Upvotes: 0

Views: 79

Answers (2)

jezrael
jezrael

Reputation: 862841

EDIT:

If not working any solution below, it means there are no missing values, only strings nans and not np.nans.

So possible solution is replace them:

df = df.replace('nan', np.nan)

You can use it, but cannot filter by it, need Series or 1d mask add DataFrame.all for test ig no values are missing values per rows (also added ~ for inverted mask).

So for filter rows with no NaNs use:

df[~np.isnan(df).all(axis=1)]

Btw, in pandas it is simplier - remove all rows with at least one NaN per rows:

df = df.dropna()

If need filter rows with at least one NaN:

df[np.isnan(df).any(axis=1)]

Upvotes: 1

adir abargil
adir abargil

Reputation: 5745

because you cannot map matrix in elementwise approach... you can remove either rows or columns:

df[~np.isnan(df).all(axis=1)]

Upvotes: 1

Related Questions