Reputation: 839
I have table data available in following format :
id value valid
1 0.43323 true
2 0.83122 false
3 0.33132 true
4 0.58351 false
5 0.74143 true
6 0.44334 true
7 0.86436 false
8 0.73555 true
9 0.56534 false
10 0.66234 true
...
I am trying to plot a histogram like this one
Wanted to know if there is a way to do it in panda dataframe to group numeric values from .0 to .1 then .1 to .2 and so on to represent data like presented in image with color coding the bar with true and false count separately.
I am thinking to create separate slices in a dictionary and then count true/false value separately. Later I can create a histogram with this. Is there a better way to plot such histogram without doing all these calculations?
What I have so far with bin and cut:
new_df = df[['value','valid']]
bins = [0, .1, .2, .3, .4, .5, .6, .7, .8, .9, 1]
s = new_df.groupby(pd.cut(new_df['value'], bins=bins)).size()
s.plot(kind='bar', stacked=True)
With this i am able to get total count histogram with bins, I am not able to apply the color coding of 'valid' column true/false count for each bar.
Upvotes: 0
Views: 714
Reputation: 153460
Try:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(123)
df = pd.DataFrame(
{
"value": np.random.random(1000),
"valid": np.random.choice([True, False], p=[0.7, 0.3], size=1000),
}
)
df["label"] = pd.cut(df["value"], bins=np.arange(0, 1.01, 0.1))
ax = (
df.groupby(["label", "valid"])
.count()
.unstack()["value"]
.plot.bar(stacked=True, rot=0, figsize=(10, 7))
)
ax.legend(loc="upper center")
ax.spines["right"].set_visible(False)
ax.spines["top"].set_visible(False)
_ = ax.set_ylim(0, 150)
Output:
Upvotes: 2