NewBee
NewBee

Reputation: 839

Panda dataframe : plot histogram with grouping

I have table data available in following format :

id  value    valid
1   0.43323  true
2   0.83122  false
3   0.33132  true
4   0.58351  false
5   0.74143  true
6   0.44334  true
7   0.86436  false
8   0.73555  true
9   0.56534  false
10  0.66234  true
...

I am trying to plot a histogram like this one

enter image description here

Wanted to know if there is a way to do it in panda dataframe to group numeric values from .0 to .1 then .1 to .2 and so on to represent data like presented in image with color coding the bar with true and false count separately.

I am thinking to create separate slices in a dictionary and then count true/false value separately. Later I can create a histogram with this. Is there a better way to plot such histogram without doing all these calculations?

What I have so far with bin and cut:

new_df = df[['value','valid']]
bins = [0, .1, .2, .3, .4, .5, .6, .7, .8, .9, 1]
s = new_df.groupby(pd.cut(new_df['value'], bins=bins)).size()
s.plot(kind='bar', stacked=True)

With this i am able to get total count histogram with bins, I am not able to apply the color coding of 'valid' column true/false count for each bar.

Upvotes: 0

Views: 714

Answers (1)

Scott Boston
Scott Boston

Reputation: 153460

Try:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(123)

df = pd.DataFrame(
    {
        "value": np.random.random(1000),
        "valid": np.random.choice([True, False], p=[0.7, 0.3], size=1000),
    }
)

df["label"] = pd.cut(df["value"], bins=np.arange(0, 1.01, 0.1))

ax = (
    df.groupby(["label", "valid"])
    .count()
    .unstack()["value"]
    .plot.bar(stacked=True, rot=0, figsize=(10, 7))
)
ax.legend(loc="upper center")
ax.spines["right"].set_visible(False)
ax.spines["top"].set_visible(False)
_ = ax.set_ylim(0, 150)

Output:

enter image description here

Upvotes: 2

Related Questions