Panda dataframe : plot histogram with grouping

Question

I have table data available in following format :

id  value    valid
1   0.43323  true
2   0.83122  false
3   0.33132  true
4   0.58351  false
5   0.74143  true
6   0.44334  true
7   0.86436  false
8   0.73555  true
9   0.56534  false
10  0.66234  true
...

I am trying to plot a histogram like this one

Wanted to know if there is a way to do it in panda dataframe to group numeric values from .0 to .1 then .1 to .2 and so on to represent data like presented in image with color coding the bar with true and false count separately.

I am thinking to create separate slices in a dictionary and then count true/false value separately. Later I can create a histogram with this. Is there a better way to plot such histogram without doing all these calculations?

What I have so far with bin and cut:

new_df = df[['value','valid']]
bins = [0, .1, .2, .3, .4, .5, .6, .7, .8, .9, 1]
s = new_df.groupby(pd.cut(new_df['value'], bins=bins)).size()
s.plot(kind='bar', stacked=True)

With this i am able to get total count histogram with bins, I am not able to apply the color coding of 'valid' column true/false count for each bar.

Scott Boston · Accepted Answer

Try:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(123)

df = pd.DataFrame(
    {
        "value": np.random.random(1000),
        "valid": np.random.choice([True, False], p=[0.7, 0.3], size=1000),
    }
)

df["label"] = pd.cut(df["value"], bins=np.arange(0, 1.01, 0.1))

ax = (
    df.groupby(["label", "valid"])
    .count()
    .unstack()["value"]
    .plot.bar(stacked=True, rot=0, figsize=(10, 7))
)
ax.legend(loc="upper center")
ax.spines["right"].set_visible(False)
ax.spines["top"].set_visible(False)
_ = ax.set_ylim(0, 150)

Output:

Panda dataframe : plot histogram with grouping

Answers (1)

Related Questions