Reputation: 9638
I often find myself wanting to plot data in one column by another, but find it hard to group/separate them by a 3th column.
Suppose I have a sheet like this
How would I create the same plot in pandas?
BTW: I like that the x-axis is linear and not just group of dates aligned next to each other since it gives and idea about how close the measurements are made to each other within a group - but would be nice to know how to do both in case the distance is too far.
The answer from @Ffisegydd has been very useful. However I was a bit too fast accepting the answer - I discovered that when trying out the code on an actual excel sheet. The problem is entirely my fault since I didn't provide an Excel sheet. @Ffisegydd was so kind to manually create the dataframe from my question, but working with excel files is a bit different.
I do apoligize. Here is an Excel file: https://dl.dropboxusercontent.com/u/3216968/Example.xlsx
This is how far I got (in IPython notebook)
import pandas as pd
import datetime as dt
path2file = r"C:\Example.xlsx"
_xl = pd.ExcelFile(path2file)
df = pd.read_excel(path2file, _xl.sheet_names[0], header=0)
df
df.Date = df.Date.apply( lambda x: dt.datetime.strptime(x, '%Y.%m.%d').date() )
df
Here is where things go wrong:
pd.DataFrame( data= [df.Data, df.Group], columns = ['Data', 'Group'], index=df.Date)
Giving this error
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-9-231baa928f67> in <module>()
----> 1 pd.DataFrame( data= [df.Data, df.Group], columns = ['Data', 'Group'], index=df.Date)
C:\Python27\lib\site-packages\pandas\core\frame.pyc in __init__(self, data, index, columns, dtype, copy)
245 index = _default_index(len(data))
246 mgr = _arrays_to_mgr(arrays, columns, index, columns,
--> 247 dtype=dtype)
248 else:
249 mgr = self._init_ndarray(data, index, columns, dtype=dtype,
C:\Python27\lib\site-packages\pandas\core\frame.pyc in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
4471 axes = [_ensure_index(columns), _ensure_index(index)]
4472
-> 4473 return create_block_manager_from_arrays(arrays, arr_names, axes)
4474
4475
C:\Python27\lib\site-packages\pandas\core\internals.pyc in create_block_manager_from_arrays(arrays, names, axes)
3757 return mgr
3758 except (ValueError) as e:
-> 3759 construction_error(len(arrays), arrays[0].shape[1:], axes, e)
3760
3761
C:\Python27\lib\site-packages\pandas\core\internals.pyc in construction_error(tot_items, block_shape, axes, e)
3729 raise e
3730 raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 3731 passed,implied))
3732
3733 def create_block_manager_from_blocks(blocks, axes):
ValueError: Shape of passed values is (2,), indices imply (2, 12)
or doing this
pd.DataFrame( {'data': df.Data, 'group': df.Group}, index=df.Date)
Upvotes: 3
Views: 6848
Reputation: 53668
You can create a groupby
object and then iterate over the groups and plot.
Below is some code which takes your data and plots the two "groups". There is some extra formatting to make the graph look good as well.
import matplotlib.pyplot as plt
import pandas as pd
import datetime as dt
path2file = r"Example.xlsx"
_xl = pd.ExcelFile(path2file)
df = pd.read_excel(path2file, _xl.sheet_names[0], header=0)
df.Date = df.Date.apply( lambda x: dt.datetime.strptime(x, '%Y.%m.%d').date())
df.index = df.Date # Set the Date column as your index
del df['Date'] # Remove the Date column from your data columns
grouped = df.groupby('Group') # groupby object
# Normally you would just iterate using "for k, g in grouped:" but the i
# is necessary for selecting a color.
colors = ['red', 'blue']
for i, (k, g) in enumerate(grouped):
plt.plot_date(g['Data'].index, g['Data'], linestyle='None', marker='o', mfc=colors[i], label=k)
plt.legend()
plt.gcf().autofmt_xdate() # Format the dates with a diagonal slant to make them fit.
# Pad the data out so all markers can be seen.
pad = dt.timedelta(days=7)
plt.xlim((min(df.index)-pad, max(df.index)+pad))
plt.ylim(0,6)
Upvotes: 4
Reputation: 7022
This should work
df.pivot_table(rows=['Date'], cols=['Group'], values=['Data']).plot()
But be aware that each data point will be a 'mean' of the data points in a particular group for a given day
Upvotes: 2