Thomas Matthew
Thomas Matthew

Reputation: 2886

Histogram from pandas DataFrame

Data

Below is the data frame I wish to represent as a histogram, with each row as a point. This won't be interesting since this will give me three bins of equal size. That's ok for now, so read on!

>>> outer_df
  patient                         cell  product
0   Pat_1               22RV1_PROSTATE       12
1   Pat_1               DU145_PROSTATE       15
2   Pat_1  LN18_CENTRAL_NERVOUS_SYSTEM        9
3   Pat_2               22RV1_PROSTATE       12
4   Pat_2               DU145_PROSTATE       15
5   Pat_2  LN18_CENTRAL_NERVOUS_SYSTEM        9
6   Pat_3               22RV1_PROSTATE       12
7   Pat_3               DU145_PROSTATE       15
8   Pat_3  LN18_CENTRAL_NERVOUS_SYSTEM        9

Desired Result

Graph each row as a point on a histogram, but also be able to pick out a particular set of data (eg all points from all cells would be in purple below, those belonging to justDU145_PROSTATE would be in red, and 22RV1_PROSTATE in blue) and graph this as an overlaid histogram. I've illustrated this with a graphic from the pandas docs:

Overlaid histogram, with three distributions (I only need 2)

Attempt 1

I first tried to use the hist method for DataFrames, but encountered an error, and a blank 4x4 series of histograms.

>>> outer_df.hist()
Traceback (most recent call last):
  File "/usr/lib/python3.3/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/pandas/tools/plotting.py", line 1977, in hist_frame
    ax.hist(data[col].dropna().values, **kwds)
  File "/usr/lib/python3/dist-packages/matplotlib/axes.py", line 8099, in hist
    xmin = min(xmin, xi.min())
TypeError: unorderable types: str() < float()

Attempt 2

Realizing DataFrame.hist() "plots the histograms of the columns on multiple subplots", moved away from this and tried outer_df.plot(kind='hist', stacked=True). Even though I took this directly from the docs, I'm stuck on this error:

>>> outer_df.plot(kind='hist', stacked=True)
Traceback (most recent call last):
  File "/usr/lib/python3.3/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/pandas/tools/plotting.py", line 1612, in plot_frame
    raise ValueError('Invalid chart type given %s' % kind)
ValueError: Invalid chart type given hist

Attempt 3 -- courtesy of @816

>>> outer_df.set_index(['patient', 'cell']).unstack('cell').plot(kind='hist', stacked=True)
Traceback (most recent call last):
  File "/usr/lib/python3.3/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/pandas/tools/plotting.py", line 1612, in plot_frame
    raise ValueError('Invalid chart type given %s' % kind)
ValueError: Invalid chart type given hist

Upvotes: 1

Views: 5229

Answers (2)

dermen
dermen

Reputation: 5372

How about this using the groupby method:

hist_data = { cell: outer_df.ix[inds,'product'] for cell,inds in outer_df.groupby('cell').groups.iteritems() }

Each value in the dict is a Series, corresponding to the cell group. Next, iterate over the cell groups, plotting histograms each time:

for cell in hist_data:
    hist_data[cell].hist(label=cell)
#pylab.legend() # need to call this to make sure the legend shows

Upvotes: 1

8one6
8one6

Reputation: 13788

how about:

outer_df.set_index(['patient', 'cell']).unstack('cell').plot(kind='hist', stacked=True)

Upvotes: 0

Related Questions