ca_san
ca_san

Reputation: 193

Histogram of a pandas dataframe

I couldn't find anywhere on the site a similar question.

I have a fairly large file, with over 100000 lines and I read it using pandas:

df = pd.read_excel("somefile.xls",index_col='Offense Type')

ended up with a dataframe consisting of the first column (the index column) and another column, 'Offense_type' and 'Hour' respectively.

'Offense Type' consists of a series of "cathegories" say cat1, cat2, cat3, etc... 'Hour' consists of a series of integer numbers between 1 and 24.

What I would like to do is obtain a histogram of the ocurrences of each number in the dataframe (there aren't that many cathegories It's at most 10 of them)

Here's an ASCII representation of what I want to get"

(the x's represent the bars in the histogram, they will surely be at a much higher value than 1,2 or 3)

   x        x         # And so on
 x x  x     x x  x    #
 x x  x  x  x x  x    #
 1 2 11 20  5 8 18    #
   Cat1      Cat2     #

But i'm getting a single barplot for every line in df using:

df.plot(kind='bar')

which is basically unreadable:

histogram_of_dataframe

I've also tried with the hist() and Histogram() function with no luck.

Here's some sample data:

sample_data

Upvotes: 0

Views: 9335

Answers (1)

ca_san
ca_san

Reputation: 193

After a long night, I got the answer since every event was ocurring only once I added an extra column in the file with the number one and then indexed the dataframe by this:

df = pd.read_excel("somefile.xls",index_col='Numberone')

And then simply tried this:

df.hist(by=df['Offense Type'])

finally getting exactly what I wanted

Upvotes: 1

Related Questions