Reputation: 193
I couldn't find anywhere on the site a similar question.
I have a fairly large file, with over 100000 lines and I read it using pandas:
df = pd.read_excel("somefile.xls",index_col='Offense Type')
ended up with a dataframe consisting of the first column (the index column) and another column, 'Offense_type' and 'Hour' respectively.
'Offense Type' consists of a series of "cathegories" say cat1, cat2, cat3, etc... 'Hour' consists of a series of integer numbers between 1 and 24.
What I would like to do is obtain a histogram of the ocurrences of each number in the dataframe (there aren't that many cathegories It's at most 10 of them)
Here's an ASCII representation of what I want to get"
(the x's represent the bars in the histogram, they will surely be at a much higher value than 1,2 or 3)
x x # And so on
x x x x x x #
x x x x x x x #
1 2 11 20 5 8 18 #
Cat1 Cat2 #
But i'm getting a single barplot for every line in df using:
df.plot(kind='bar')
which is basically unreadable:
I've also tried with the hist() and Histogram() function with no luck.
Here's some sample data:
Upvotes: 0
Views: 9335
Reputation: 193
After a long night, I got the answer since every event was ocurring only once I added an extra column in the file with the number one and then indexed the dataframe by this:
df = pd.read_excel("somefile.xls",index_col='Numberone')
And then simply tried this:
df.hist(by=df['Offense Type'])
finally getting exactly what I wanted
Upvotes: 1