Pandas dropna, which rows are being dropped

Question

This is the data frame that I have:

                   A         B         C  D   F   E
2013-01-01  0.000000  0.000000  0.100928  5 NaN   1
2013-01-02  0.640525  0.220630  1.070226  5   1   1
2013-01-03 -0.963793 -0.476044 -0.581649  5   2 NaN
2013-01-04  0.882686 -0.371904 -1.320758  5   3 NaN
2013-01-05  0.021979  0.680987 -0.605329  5   4 NaN
2013-01-06 -0.238726 -0.487410 -0.383292  5   5 NaN

I then run the following code: df1.dropna(how='any'), where df1 is the above data frame. When I look at df1 afterwards, this is what I get.

                   A         B         C  D   F   E
2013-01-01  0.000000  0.000000  0.100928  5 NaN NaN
2013-01-02  0.640525  0.220630  1.070226  5   1 NaN
2013-01-03 -0.963793 -0.476044 -0.581649  5   2 NaN
2013-01-04  0.882686 -0.371904 -1.320758  5   3 NaN

I thought that dropna drops any row that has a NaN value in it. Therefore, I was expecting it to return just this:

                   A         B         C  D   F   E
2013-01-02  0.640525  0.220630  1.070226  5   1   1

Why isn't that the case?

EDIT: Here is the code

This is what I start with:

dates = pd.date_range('20130101',periods=6)
df = pd.DataFrame(np.random.randn(6,4),index=dates,columns = list('ABCD'))

then I do this:

s1 = pd.Series([1,2,3,4,5,6],index=pd.date_range('20130102',periods=6))
df['F'] = s1
df.at[dates[0],'A'] = 0
df.iat[0,1] = 0
df.loc[:,'D'] = np.array([5]*len(df))
df1 = df.reindex(index=dates[0:4], columns = list(df.columns) + ['E'])
df1.loc[dates[0]:dates[1],'E'] = 1

and then I run the dropna

Davis Kirkendall · Accepted Answer

dropna returns a new DataFrame. Therefore to get the result you are looking for you must add

df2 = df1.dropna(how='any');

Now df2 holds the desired output. If you want df1 to have thr result, use:

df1.dropna(how='any', inplace=True)

which modifies df1 inplace. Hope this helps!

Pandas dropna, which rows are being dropped

Answers (1)

Related Questions