Reputation: 33
I am struggling to replicate the elegant ease - and successful outcome - teasingly promised in the 'Basic Plotting:plot' section of the pandas df.plot() documentation at:
http://pandas.pydata.org/pandas-docs/stable/visualization.html#visualization
There the authors' first image is pretty close to the kind of line-graph I want to plot from my dataframe. Their first df and resulting plot is a single-liner just as I hoped my df below would look when plotted.
My df looks like this:
2014-03-28 2014-04-04 2014-04-11 2014-04-18 \
Jenny Todd 1699.6 1741.6 1710.7 1744.2
2014-04-25 2014-05-02 2014-05-09
Jenny Todd 1764.2 1789.7 1802.3
Their second image is a multi-line graph very similar to what I hoped for when I try to plot a multiple-index version of my df. Eg:
2014-06-13 2014-06-20 2014-06-27 \
William Acer 1674.7 1689.4 1682.0
Katherine Baker 1498.5 1527.3 1530.5
2014-07-04 2014-07-11 2014-07-18 \
William Acer 1700.0 1674.5 1677.8
Katherine Baker 1540.4 1522.3 1537.3
2014-07-25
William Acer 1708.0
Katherine Baker 1557.1
However, they get plots. I get featureless 3.3kb images and a warning:
/home/lee/test/local/lib/python2.7/site-packages/matplotlib/axes/_base.py:2787: UserWarning: Attempting to set identical left==right results in singular transformations; automatically expanding. left=0.0, right=0.0 'left=%s, right=%s') % (left, right))
The authors of the documentation seem to have the plot() function deducing from the df's indexes the values of the x-axis and the range and values of the y axis.
Searching around, I can find people with different data, different indexes and different scenarios (for example, plotting one column against another or trying to produce multiple subplots) who get this kind of 'axes' error. However, I haven't been able to map their issues to mine.
I wonder if anyone can help resolve what is different about my data or code that leads to a different plot outcome from the documentation's seemingly-similar data and seemingly-similar code.
My code:
print plotting_df # (This produces the df examples I pasted above)
plottest = plotting_df.plot.line(title='Calorie Intake', legend=True)
plottest.set_xlabel('Weeks')
plottest.set_ylabel('Calories')
fig = plt.figure()
plot_name = week_ending + '_' + collection_name + '.png'
fig.savefig(plot_name)
Note this dataframe is being created dynamically many times within the script. On any given run, the script will acquire different sets of dates, differently-named people, and different numbers to plot. So I don't have predictability about what strings will come up for index and legend labels for plotting beforehand. I do have predictability about the format.
I get that my dataframe's date index has differently-formatted dates than the referred documentation describes. Is this the cause? Whether it is or isn't, how should one best handle this issue?
Added on 2016-08-24 to answer the comment below about being unable to recreate my data
plotting_df is created on the fly as a subset of a much larger dataframe. It's simply an index (or sometimes multiple indices) and some of the date columns extracted from the larger dataframe. The code that produces plotting_df works fine and always produces plotting_df with correct indices and columns in a format I expect.
I can simulate creation of a dataset to store in plotting_df with this python code:
plotting_1 = {
'2014-03-28': 1699.6,
'2014-04-04': 1741.6,
'2014-04-11': 1710.7,
'2014-04-18': 1744.2,
'2014-04-25': 1764.2,
'2014-05-02': 1789.7,
'2014-05-09': 1802.3
}
plotting_df = pd.DataFrame(plotting_1, index=['Jenny Todd'])
and I can simulate creation of a multiple-indices plotting_df with this python code:
plotting_2 = {
'Katherine Baker': {
'2014-06-13': 1498.5,
'2014-06-20': 1527.3,
'2014-06-27': 1530.5,
'2014-07-04': 1540.4,
'2014-07-11': 1522.3,
'2014-07-18': 1537.3,
'2014-07-25': 1557.1
},
'William Acer': {
'2014-06-13': 1674.7,
'2014-06-20': 1689.4,
'2014-06-27': 1682.0,
'2014-07-04': 1700.0,
'2014-07-11': 1674.5,
'2014-07-18': 1677.8,
'2014-07-25': 1708.0
}
}
plotting_df = pd.DataFrame.from_dict(plotting_2)
I did try the suggested transform with code:
plotdf = plotting_df.T
plotdf.index = pd.to_datetime(plotdf.index)
so that my original code now looks like:
print plotting_df # (This produces the df examples I pasted above)
plotdf = plotting_df.T # Transform the df - date columns to indices
plotdf.index = pd.to_datetime(plotdf.index) # Convert indices to datetime
plottest = plotdf.plot.line(title='Calorie Intake', legend=True)
plottest.set_xlabel('Weeks')
plottest.set_ylabel('Calories')
fig = plt.figure()
plot_name = week_ending + '_' + collection_name + '.png'
fig.savefig(plot_name)
but I still get the same result (blank 3.3kb images created).
I did note that adding the transform made no difference when I printed out the first instance of plotdf. So should be I doing some other transform?
Upvotes: 3
Views: 247
Reputation: 3272
This is your problem:
fig = plt.figure()
plot_name = week_ending + '_' + collection_name + '.png'
fig.savefig(plot_name)
You are creating a second figure after creating the first one and then you are saving only that second empty figure. Just take out the line fig = plt.figure()
and change fig.savefig
to plt.savefig
So you should have :
print plotting_df # (This produces the df examples I pasted above)
plotdf = plotting_df.T # Transform the df - date columns to indices
plotdf.index = pd.to_datetime(plotdf.index) # Convert indices to datetime
plottest = plotdf.plot.line(title='Calorie Intake', legend=True)
plottest.set_xlabel('Weeks')
plottest.set_ylabel('Calories')
plot_name = week_ending + '_' + collection_name + '.png'
plt.savefig(plot_name)
Upvotes: 1