Reputation: 2886
Below is the data frame I wish to represent as a histogram, with each row as a point. This won't be interesting since this will give me three bins of equal size. That's ok for now, so read on!
>>> outer_df
patient cell product
0 Pat_1 22RV1_PROSTATE 12
1 Pat_1 DU145_PROSTATE 15
2 Pat_1 LN18_CENTRAL_NERVOUS_SYSTEM 9
3 Pat_2 22RV1_PROSTATE 12
4 Pat_2 DU145_PROSTATE 15
5 Pat_2 LN18_CENTRAL_NERVOUS_SYSTEM 9
6 Pat_3 22RV1_PROSTATE 12
7 Pat_3 DU145_PROSTATE 15
8 Pat_3 LN18_CENTRAL_NERVOUS_SYSTEM 9
Graph each row as a point on a histogram, but also be able to pick out a particular set of data (eg all points from all cells would be in purple below, those belonging to justDU145_PROSTATE
would be in red, and 22RV1_PROSTATE
in blue) and graph this as an overlaid histogram. I've illustrated this with a graphic from the pandas docs:
I first tried to use the hist
method for DataFrames, but encountered an error, and a blank 4x4 series of histograms.
>>> outer_df.hist()
Traceback (most recent call last):
File "/usr/lib/python3.3/code.py", line 90, in runcode
exec(code, self.locals)
File "<input>", line 1, in <module>
File "/usr/lib/python3/dist-packages/pandas/tools/plotting.py", line 1977, in hist_frame
ax.hist(data[col].dropna().values, **kwds)
File "/usr/lib/python3/dist-packages/matplotlib/axes.py", line 8099, in hist
xmin = min(xmin, xi.min())
TypeError: unorderable types: str() < float()
Realizing DataFrame.hist()
"plots the histograms of the columns on multiple subplots", moved away from this and tried outer_df.plot(kind='hist', stacked=True)
. Even though I took this directly from the docs, I'm stuck on this error:
>>> outer_df.plot(kind='hist', stacked=True)
Traceback (most recent call last):
File "/usr/lib/python3.3/code.py", line 90, in runcode
exec(code, self.locals)
File "<input>", line 1, in <module>
File "/usr/lib/python3/dist-packages/pandas/tools/plotting.py", line 1612, in plot_frame
raise ValueError('Invalid chart type given %s' % kind)
ValueError: Invalid chart type given hist
>>> outer_df.set_index(['patient', 'cell']).unstack('cell').plot(kind='hist', stacked=True)
Traceback (most recent call last):
File "/usr/lib/python3.3/code.py", line 90, in runcode
exec(code, self.locals)
File "<input>", line 1, in <module>
File "/usr/lib/python3/dist-packages/pandas/tools/plotting.py", line 1612, in plot_frame
raise ValueError('Invalid chart type given %s' % kind)
ValueError: Invalid chart type given hist
Upvotes: 1
Views: 5229
Reputation: 5372
How about this using the groupby
method:
hist_data = { cell: outer_df.ix[inds,'product'] for cell,inds in outer_df.groupby('cell').groups.iteritems() }
Each value in the dict is a Series, corresponding to the cell group. Next, iterate over the cell groups, plotting histograms each time:
for cell in hist_data:
hist_data[cell].hist(label=cell)
#pylab.legend() # need to call this to make sure the legend shows
Upvotes: 1
Reputation: 13788
how about:
outer_df.set_index(['patient', 'cell']).unstack('cell').plot(kind='hist', stacked=True)
Upvotes: 0