NaN-columns is plotted as a all 0 column in pandas

Question

I have got some problems with plotting a sliced DataFrame with entire columns filled with NaN's.

How come:

pandas.DataFrame(
    dict(
        A=pandas.Series([np.NaN]*32),
        B=pd.Series(range(-1,32))
    )
).plot()

differs from:

#Ugly fix
pandas.DataFrame(
    dict(
        A=pandas.Series( [0] + [numpy.NaN]*32),
        B=pd.Series(range(-1,32))
    )
).plot()

by plotting a 0-line as if the column is filled with zeros. Shouldn't the first code work just as:

pylab.plot(
    range(0,33),
    range(-1,32),
    range(0,32),
    [numpy.NaN]*32
)

And also plotting just a Series filled with NaN works fine:

pandas.Series([numpy.NaN]*32).plot()

What am I missing? Is there a right way to plot a column with all NaN's or is it a bug?

lbolla · Accepted Answer

This looks like a bug in pandas. Looking at the source code, in pandas.tools.plotting, lines 554:556:

empty = df[col].count() == 0                                       
# is this right?                                                   
values = df[col].values if not empty else np.zeros(len(df))

If the column contains only NaNs, then empty is True and values is set to np.zeros().

Note: I did not add the "is this right?" comment: it's in the source code! (pandas v.0.8.1).

I've raised a bug about it: https://github.com/pydata/pandas/issues/1696

NaN-columns is plotted as a all 0 column in pandas

Answers (1)

Related Questions