Reputation: 424
I have some dataframes, which contain a lot of nan.
i want to make a mask by the frist dataframe, then only keep those columns which contains no np.nan in the first datafame.
let me give an example:
In [69]: df = pd.DataFrame(np.reshape(range(25), (5,5)))
In [70]: df
Out[70]:
0 1 2 3 4
0 0 1 2 3 4
1 5 6 7 8 9
2 10 11 12 13 14
3 15 16 17 18 19
4 20 21 22 23 24
In [71]: df[5] = np.nan
In [72]: df
Out[72]:
0 1 2 3 4 5
0 0 1 2 3 4 NaN
1 5 6 7 8 9 NaN
2 10 11 12 13 14 NaN
3 15 16 17 18 19 NaN
4 20 21 22 23 24 NaN
### the following is the mask
In [73]: np.isnan(df)
Out[73]:
0 1 2 3 4 5
0 False False False False False True
1 False False False False False True
2 False False False False False True
3 False False False False False True
4 False False False False False True
In [74]: df[~np.isnan(df)]
Out[74]:
0 1 2 3 4 5
0 0 1 2 3 4 NaN
1 5 6 7 8 9 NaN
2 10 11 12 13 14 NaN
3 15 16 17 18 19 NaN
4 20 21 22 23 24 NaN
you can see, i use np.isnan to create a mask.
then use df[mask]
to filter.
but it looks failed, the output still contains column5. is there anything i used wrongly?
Upvotes: 0
Views: 79
Reputation: 862841
EDIT:
If not working any solution below, it means there are no missing values, only strings nan
s and not np.nan
s.
So possible solution is replace them:
df = df.replace('nan', np.nan)
You can use it, but cannot filter by it, need Series or 1d mask add DataFrame.all
for test ig no values are missing values per rows (also added ~
for inverted mask).
So for filter rows with no NaN
s use:
df[~np.isnan(df).all(axis=1)]
Btw, in pandas it is simplier - remove all rows with at least one NaN per rows:
df = df.dropna()
If need filter rows with at least one NaN
:
df[np.isnan(df).any(axis=1)]
Upvotes: 1
Reputation: 5745
because you cannot map matrix in elementwise approach... you can remove either rows or columns:
df[~np.isnan(df).all(axis=1)]
Upvotes: 1