Reputation: 1700
I am new to pandas and matplotlib. I have a csv file which consist of year from 2012 to 2018. For each month of the year, I have Rain data. I want to analyze by the histogram, which month of the year having maximum rainfall. Here is my dataset.
year month Temp Rain
2012 1 10 100
2012 2 20 200
2012 3 30 300
.. .. .. ..
2012 12 40 400
2013 1 50 300
2013 2 60 200
.. .. .. ..
2018 12 70 400
I could not able to plot with histogram, I tried plotting with the bar but not getting desired result. Here what I have tried:
import pandas as pd
import numpy as npy
import matplotlib.pyplot as plt
df2=pd.read_csv('Monthly.csv')
df2.groupby(['year','month'])['Rain'].count().plot(kind="bar",figsize=(20,10))
Please suggest me an approach to plot an histogram to analyze maxmimum rainfall happening in which month grouped by year.
Upvotes: 0
Views: 1253
Reputation: 1633
First groubby year and month as you already did, but only keep the maximum rainfall.
series_df2 = df2.groupby(['year','month'], sort=False)['Rain'].max()
Then unstack the series, transpose it and plot it.
series_df2.unstack().T.plot(kind='bar', subplots=False, layout=(2,2))
This will give you an output that looks like this for your sample data:
Upvotes: 0
Reputation: 1286
Probably you don't want to see the count
per group but
df2.groupby(['year','month'])['Rain'].first().plot(kind="bar",figsize=(20,10))
or maybe
df2.groupby(['month'])['Rain'].sum().plot(kind="bar",figsize=(20,10))
Upvotes: 1
Reputation: 17007
you are closed to solution, i'll write: use max() and not count()
df2.groupby(['year','month'])['Rain'].max().plot(kind="bar",figsize=(20,10))
Upvotes: 1