Denis Ka
Denis Ka

Reputation: 137

Plot a histogram, based on percentiles

I have a frame with the folowing structure:

df = pd.DataFrame({'ID': np.random.randint(1, 13, size=1000),
                   'VALUE': np.random.randint(0, 300, size=1000)})

How could i plot the graph, where on the X-axis there will be percentiles (10%, 20%,..90%) and on the Y-axis there should be quantity of values, that lies between percentile ticks , for example 20%-30% And ther must be a seperate plot for every ID (and different percentiles values also)

i've found percentiles and stuck q = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8] df.groupby('ID')['VALUE'].quantile(q)

I guess the plot should look like a histogram for VALUE parameter, but with percentage on X axis instead of numeric values

Upvotes: 2

Views: 12620

Answers (2)

Quang Hoang
Quang Hoang

Reputation: 150745

Try:

df['Quantile'] = pd.qcut(df.VALUE, q=np.arange(0,1.1,0.1))
tmp_df = df.pivot_table(index='Quantile', columns='ID', aggfunc='count')
tmp_df.plot(kind='bar', subplots=True, figsize=(10,10))
plt.show()

Output, each subplot is the quantile count for each ID.

enter image description here

Upvotes: 2

flurble
flurble

Reputation: 1106

q = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]

for name, group in df.groupby('ID'):  # Groupy by ID column
    _, bins = pd.qcut(group.VALUE, q, retbins=True, grid=False)  # Splits data in defined quantiles
    plt.figure()
    group.VALUE.hist(bins=bins)  # Plots histogram of data with specified bins
    ax.set_xticks(q, [f'{str(x) * 100}%' for x in q])  # format ticks (NOT TESTED)   
    plt.show()

Not capturing the output plots here, because they are alot. It produces the plot you want, but you will also need to adapt the ticks and formatting.

To achieve a normalized plot, with y-Axis ranging from 0-100%, you would need to normalize your data before plotting (maybe somehting like group.VALUE.count() / df.VALUE.count()

Upvotes: 2

Related Questions