Reputation: 29
I have written following code to plot histogram from given set of values in csv file column :
import pandas as pd import matplotlib.pyplot as plt import numpy
class createHistogram():
def __init__(self,csv_file):
self.csv_file = csv_file
def load_csv(self):
bin_edge = range(0,100,10)
tp_data = pd.read_csv(self.csv_file)
dataframe = pd.DataFrame(tp_data)['tp']
dataframe.hist(bins=bin_edge)
plt.show()
return tp_data
Here I am getting histogram if values are less than 10, 20 ... and so on, but i want it should be
bin_value<=10
10< bin_value <=20
20
I am new to panda module..
Upvotes: 1
Views: 2489
Reputation: 3103
You can use the Pandas native cut
which defines bins in the form of intervals.
ser = pd.Series(np.random.randint(1, 100, 50))
bins = range(0, 101, 10)
The pd.cut
classifiies the data into bins using Categorical
method.
In [4]: pd.cut(ser, bins).cat.categories
Out[4]:
IntervalIndex([(0, 10], (10, 20], (20, 30], (30, 40], (40, 50], (50, 60], (60, 70], (70, 80], (80, 90], (90, 100]]
closed='right',
dtype='interval[int64]')
If you would like to further plot them, it would go something like this:
In [5]: pd.cut(ser, bins).value_counts().plot(kind='bar')
Out[5]: <matplotlib.axes._subplots.AxesSubplot at 0x10e673b70>
Upvotes: 2