Reputation: 6501
So, I have data in pandas dataframe, where row names are given in datetime pandas.tseries. I can plot the data in matplotlib and I get this figure:
however, I want to use plotly to draw the same graph in inetarctive mode. It works follows, but it doesn't show the datetime, instead it replaces the x-axis with integer indexing!
https://plot.ly/~vmirjalily/5/
The figure in the URL above is plotted using this code:
dfmean = df.mean(axis=1)
dfmean_mavg = pd.rolling_mean(dfmean, 50)
dfmean.plot(linewidth=1.5, label='Mean of 20')
dfmean_mavg.plot(linewidth=3, label='Moving Avg.')
#plt.legend(loc=2)
l1 = plt.plot(dfmean, 'b-', linewidth=3)
l2 = plt.plot(dfmean_mavg, 'g-', linewidth=4)
mpl_fig1 = plt.gcf()
py.iplot_mpl(mpl_fig1, filename='avg-price.20stocks')
but this code doesn't show the datetime index in the x-axis. I tried to force the datetime index as below:
l1 = plt.plot(np.array(dfmean.index), dfmean, 'b-', linewidth=3)
l2 = plt.plot(np.array(dfmean_mavg.index), dfmean_mavg, 'g-', linewidth=4)
mpl_fig1 = plt.gcf()
py.iplot_mpl(mpl_fig1, filename='avg-price.20stocks')
but it gave a long list of errors as below
:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-35-4a3ca217202d> in <module>()
14 mpl_fig1 = plt.gcf()
15
---> 16 py.iplot_mpl(mpl_fig1, filename='avg-price.20stocks')
/usr/local/lib/python2.7/dist-packages/plotly/plotly/plotly.pyc in iplot_mpl(fig, resize, strip_style, update, **plot_options)
257 "object. Run 'help(plotly.graph_objs.Figure)' for more info."
258 )
--> 259 return iplot(fig, **plot_options)
260
261
/usr/local/lib/python2.7/dist-packages/plotly/plotly/plotly.pyc in iplot(figure_or_data, **plot_options)
113 if 'auto_open' not in plot_options:
114 plot_options['auto_open'] = False
--> 115 res = plot(figure_or_data, **plot_options)
116 urlsplit = res.split('/')
117 username, plot_id = urlsplit[-2][1:], urlsplit[-1] # TODO: HACKY!
/usr/local/lib/python2.7/dist-packages/plotly/plotly/plotly.pyc in plot(figure_or_data, validate, **plot_options)
212 pass
213 plot_options = _plot_option_logic(plot_options)
--> 214 res = _send_to_plotly(figure, **plot_options)
215 if res['error'] == '':
216 if plot_options['auto_open']:
/usr/local/lib/python2.7/dist-packages/plotly/plotly/plotly.pyc in _send_to_plotly(figure, **plot_options)
971 fig = tools._replace_newline(figure) # does not mutate figure
972 data = json.dumps(fig['data'] if 'data' in fig else [],
--> 973 cls=utils._plotlyJSONEncoder)
974 username, api_key = _get_session_username_and_key()
975 kwargs = json.dumps(dict(filename=plot_options['filename'],
/usr/lib/python2.7/json/__init__.pyc in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, encoding, default, **kw)
236 check_circular=check_circular, allow_nan=allow_nan, indent=indent,
237 separators=separators, encoding=encoding, default=default,
--> 238 **kw).encode(obj)
239
240
/usr/lib/python2.7/json/encoder.pyc in encode(self, o)
199 # exceptions aren't as detailed. The list call should be roughly
200 # equivalent to the PySequence_Fast that ''.join() would do.
--> 201 chunks = self.iterencode(o, _one_shot=True)
202 if not isinstance(chunks, (list, tuple)):
203 chunks = list(chunks)
/usr/lib/python2.7/json/encoder.pyc in iterencode(self, o, _one_shot)
262 self.key_separator, self.item_separator, self.sort_keys,
263 self.skipkeys, _one_shot)
--> 264 return _iterencode(o, 0)
265
266 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,
/usr/local/lib/python2.7/dist-packages/plotly/utils.pyc in default(self, obj)
144 if s is not None:
145 return s
--> 146 raise e
147 return json.JSONEncoder.default(self, obj)
148
TypeError: masked is not JSON serializable
Here is my package versions:
IPython 2.0.0
numpy 1.9.0
numexpr 2.2.2
pandas 0.15.0
matplotlib 1.4.0
plotly 1.4.7
And the first 10 lines of my dataframe:
Date
2011-01-04 54.2430
2011-01-05 54.3935
2011-01-06 54.4665
2011-01-07 54.5920
2011-01-10 54.9435
2011-01-11 54.9340
2011-01-12 55.4755
2011-01-13 55.5495
2011-01-14 56.0230
dtype: float64
Upvotes: 2
Views: 2568
Reputation: 957
There are a couple things going on here.
This traceback is telling you that you can't serialize masked numbers. Masked numbers are slightly different than NaN. Here's a bit of info if you're curious: http://pandas.pydata.org/pandas-docs/dev/gotchas.html#nan-integer-na-values-and-na-type-promotions
The reason you have masked numbers is the moving average calculation you do. It makes the first N
values, where N
is the number of points you're averaging over, masked.
Therefore, if you get rid of the masked values by manipulating the data frame, you wouldn't see that issue any more.
Taking a queue from what DataFrame.to_json()
does with masked values (turns them to null
), the most appropriate value to replace with in your list would be None
if you try to go down that road. None
translates best to null
.
A bit of background. When dates are in matplotlib, they are floating-point values representing the number of days since 0001-01-01
+ 1, (see matplotlib dates for more info). However, importing pandas
will alter this to use a different date representation, the number of days since the unix epoch, another floating point number. Version 1.4.7 in plotly was meant to handle both discrepancies by converting back to an ISO string, but perhaps there's another avenue that you've found. I can't seem to recreate this error on my end though. Here's the code I tried:
import random
import pandas as pd
import matplotlib.pyplot as plt
import plotly.plotly as py
import plotly.tools as tls
num_pts = 1000
data = [random.random() for i in range(num_pts)]
index = pd.date_range('2011-01-04', periods=num_pts)
df = pd.DataFrame(data=data, index=index)
dfmean = df.mean(axis=1)
dfmean_mavg = pd.rolling_mean(dfmean, 50)
dfmean.plot(linewidth=1.5, label='Mean of 20')
# dfmean_mavg.plot(linewidth=3, label='Moving Avg.')
mpl_fig1 = plt.gcf()
py.plot_mpl(mpl_fig1, filename='avg-price.20stocks')
plt.plot
on the seriesIt looks like you try to plot the portions of your data twice? I'm more familiar with calling the plot
method directly on a data frame, which is why I chose to only include this version in the code snippet above.
There's a PR open on Plotly's python api GH repo to handle this: https://github.com/plotly/python-api/pull/159. It should be up on PyPi tomorrow.
Upvotes: 3