منى
منى

Reputation: 666

Represent intervals within the x axis of histogram in Python

I am trying to represent the column percT through a histogram in Python. Here is my input file below:

programName,reqMethID,countT,countN,countU,totalcount,percT,percN,percU
chess,1-9,0,1,0,1,0.0,100.0,0.0
chess,1-16,1,1,0,2,50.0,50.0,0.0
chess,1-4,1,2,0,3,33.33,66.67,0.0
chess,2-9,1,3,0,4,25.0,75.0,0.0
chess,2-16,1,4,0,5,20.0,80.0,0.0
chess,2-4,1,5,0,6,16.67,83.33,0.0
chess,3-9,1,6,0,7,14.29,85.71,0.0
chess,3-16,1,7,0,8,12.5,87.5,0.0
chess,3-4,1,8,0,9,11.11,88.89,0.0
chess,4-9,1,9,0,10,10.0,90.0,0.0
chess,4-16,1,10,0,11,9.09,90.91,0.0
chess,4-4,2,10,0,12,16.67,83.33,0.0
chess,5-9,2,11,0,13,15.38,84.62,0.0
chess,5-16,2,12,0,14,14.29,85.71,0.0
chess,5-4,2,13,0,15,13.33,86.67,0.0
chess,6-9,3,13,0,16,18.75,81.25,0.0
chess,6-16,3,14,0,17,17.65,82.35,0.0
chess,6-4,3,15,0,18,16.67,83.33,0.0
chess,7-9,4,15,0,19,21.05,78.95,0.0
chess,7-16,4,16,0,20,20.0,80.0,0.0
chess,7-4,4,17,0,21,19.05,80.95,0.0
chess,8-9,4,18,0,22,18.18,81.82,0.0
chess,8-16,4,19,0,23,17.39,82.61,0.0
chess,8-4,4,20,0,24,16.67,83.33,0.0
chess,1-10,0,1,0,1,0.0,100.0,0.0
chess,1-17,1,1,0,2,50.0,50.0,0.0
chess,2-10,1,2,0,3,33.33,66.67,0.0
chess,2-17,1,3,0,4,25.0,75.0,0.0
chess,3-10,1,4,0,5,20.0,80.0,0.0
chess,3-17,1,5,0,6,16.67,83.33,0.0
chess,4-10,1,6,0,7,14.29,85.71,0.0
chess,4-17,1,7,0,8,12.5,87.5,0.0
chess,5-10,1,8,0,9,11.11,88.89,0.0
chess,5-17,1,9,0,10,10.0,90.0,0.0
chess,6-10,2,9,0,11,18.18,81.82,0.0

Here is the code I am using in Python to represent the data above in a histogram manner:

    dataset = pd.read_csv( 'TNUPercentages.txt', sep= ',', index_col=False) 
X_ticks_array=[i for i in range(0, 100, 10)]
plt.xticks(X_ticks_array)


Tdata= dataset['percT']
print(Tdata.head())
plt.hist(Tdata);
plt.xlabel('Percentages of T')
plt.ylabel('Frequency')
plt.show()

The problem is that I am getting this graph. The x axis represents the values within the column percT and the y axis represents the frequency of these values. The problem is that it is hard to distinguish the frequency of data that has 0 in the x axis from the frequency of data that has 5 in the x axis or 10 in the x axis. I would like the x axis to have 11 bins with each bin representing each of the following intervals: 0, (0-10], (10,20], (20-30], (30-40], (40-50], (50-60],(60-70], (70-80], (80-90], (90-100], these intervals correspond to the values that fall in the percT column and the y axis should represent the frequency that such a value would occur in the data set. How can I do that?

histogram

Upvotes: 3

Views: 10237

Answers (3)

fireball.1
fireball.1

Reputation: 1521

You can try using seaborn for better visualization. It is essentially an add on to matplotlib.

import seaborn as sns
sns.set_theme()
dataset = pd.read_csv( 'TNUPercentages.txt', sep= ',', index_col=False) 
X_ticks_array=[i for i in range(0, 100, 10)]
plt.xticks(X_ticks_array)


Tdata= dataset['percT']
print(Tdata.head())
sns.distplot(Tdata,bins=np.arange(0, 100, 10))
plt.xlabel('Percentages of T')
plt.ylabel('Frequency')
plt.show()

enter image description here

If you don't want the distribution plot line, you can make use of the background grid by just doing the following:

import seaborn as sns
sns.set_theme()
dataset = pd.read_csv( 'TNUPercentages.txt', sep= ',', index_col=False) 
X_ticks_array=[i for i in range(0, 100, 10)]
plt.xticks(X_ticks_array)


Tdata= dataset['percT']
print(Tdata.head())
plt.hist(Tdata,bins=np.arange(0, 100, 10))
plt.xlabel('Percentages of T')
plt.ylabel('Frequency')
plt.show()

enter image description here

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150735

Do you mean:

bins=np.arange(0, 100, 10)
plt.hist(dataset['percT'], bins=bins, edgecolor='w')
plt.xticks(bins);

Output:

enter image description here


Update: per commented:

bins=np.arange(-10, 100, 10)

(pd.cut(dataset['percT'], bins=bins, labels=bins[1:])
   .astype(int).value_counts()
   .sort_index()
   .plot.bar(align='edge',width=1, edgecolor='w')
)
plt.xticks(np.arange(len(bins)),bins);

Output:

enter image description here

Upvotes: 2

Paul H
Paul H

Reputation: 68136

pandas cut and value_counts methods can be helpful here:

fig, ax = pyplot.subplots(figsize=(6, 3.5))
(
    pandas.cut(data['percT'], bins=numpy.arange(0, 100, 10))
        .value_counts()
        .sort_index()
        .plot.bar(ax=ax)
)

enter image description here

Upvotes: 3

Related Questions