Reputation: 96478
Say I have a dataframe like the following:
A B C D
s1 1 2 4 2
s2 2 1 4 3
s3 1 4 1 3
I would like to get a bar plot that shows the histogram of values per column. That is, a bar plot that shows histograms per column next to each other in the x axis, with spacing between the histograms (columns). In other words, it would be a two-level bar chart, where for each column in the dataframe we have bars representing the histogram of the column.
In case it matters, we can assume that the number of possible values each column has is known and constant for every column (e.g. range [0,5]
)
When I try doing:
df.plot(kind='bar')
I get something completely different from what I want (the x ticks correspond to the rows, instead of [columns: [value0, value1, valueN]
). The closest "in spirit" to what I want is is:
df.plot(kind='density')
But I am looking for a histogram-like description per column, more than an overlay of PDFs.
Hopefully an example helps. I am looking for something like this plot below, (code here) but instead of showing two scores per group, it would show a histogram of values per column in my dataframe:
Upvotes: 2
Views: 6647
Reputation: 1457
This presentation doesn't rescale, it horizontally translates the individual histograms so that they don't overlap and then labels the X-axis with the column names (at median values) rather than represent scale.
from pandas import DataFrame
from numpy.random import randn
sample = 1000
df = DataFrame(randn(sample, 8))
accum1 = 0
accum2 = 0
spacer = 1
MyTics = []
for colname in df.columns:
TransformedValues = df[colname] - accum1 + accum2
MyTics.extend([TransformedValues.median()])
axs = (TransformedValues).hist()
accum1 += df[colname].min()
accum2 += df[colname].max() + spacer
axs.set_xticks(MyTics)
axs.set_xticklabels(df.columns)
Upvotes: 1
Reputation: 919
There is numpy's histogram function, and matplotlib's histogram plotting function 'hist'.
Upvotes: 0