Reputation: 137
I have a frame with the folowing structure:
df = pd.DataFrame({'ID': np.random.randint(1, 13, size=1000),
'VALUE': np.random.randint(0, 300, size=1000)})
How could i plot the graph, where on the X-axis there will be percentiles (10%, 20%,..90%) and on the Y-axis there should be quantity of values, that lies between percentile ticks , for example 20%-30% And ther must be a seperate plot for every ID (and different percentiles values also)
i've found percentiles and stuck
q = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]
df.groupby('ID')['VALUE'].quantile(q)
I guess the plot should look like a histogram for VALUE parameter, but with percentage on X axis instead of numeric values
Upvotes: 2
Views: 12620
Reputation: 150745
Try:
df['Quantile'] = pd.qcut(df.VALUE, q=np.arange(0,1.1,0.1))
tmp_df = df.pivot_table(index='Quantile', columns='ID', aggfunc='count')
tmp_df.plot(kind='bar', subplots=True, figsize=(10,10))
plt.show()
Output, each subplot is the quantile count for each ID.
Upvotes: 2
Reputation: 1106
q = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]
for name, group in df.groupby('ID'): # Groupy by ID column
_, bins = pd.qcut(group.VALUE, q, retbins=True, grid=False) # Splits data in defined quantiles
plt.figure()
group.VALUE.hist(bins=bins) # Plots histogram of data with specified bins
ax.set_xticks(q, [f'{str(x) * 100}%' for x in q]) # format ticks (NOT TESTED)
plt.show()
Not capturing the output plots here, because they are alot. It produces the plot you want, but you will also need to adapt the ticks and formatting.
To achieve a normalized plot, with y-Axis ranging from 0-100%, you would need to normalize your data before plotting (maybe somehting like group.VALUE.count() / df.VALUE.count()
Upvotes: 2