Reputation: 85
I am trying to plot a graph to represent a monthly river discharge dataset from 1980-01-01 to 2013-12-31.
Please check out this graph
The plan is to plot "Jan Feb Mar Apr May...Dec" as the x-axis and the discharge (m3/s) as the y-axis. The actual lines on the graphs would represent the years. Alternatively, the lines on the graph would showcase monthly average (from jan to dec) of every year from 1980 to 2013.
DAT = pd.read_excel('Modelled Discharge_UIB_1980-2013_Daily.xlsx',
sheet_name='Karhmong', header=None, skiprows=1,
names=['year', 'month', 'day', 'flow'],
parse_dates={ 'date': ['year', 'month', 'day'] },
index_col='date')
the above is to show what type of data it is
date flow
1980-01-01 104.06
1980-01-02 103.81
1980-01-03 103.57
1980-01-04 103.34
1980-01-05 103.13
... ...
2013-12-27 105.65
2013-12-28 105.32
2013-12-29 105.00
2013-12-30 104.71
2013-12-31 104.42
because I want to compare all the years to each other so I tried the below command
DAT1980 = DAT[DAT.index.year==1980]
DAT1980
DAT1981 = DAT[DAT.index.year==1981
DAT1981
...etc
in terms of grouping the months for the x-axis I tried grouping months using the command
datmonth = np.unique(DAT.index.month)
so far all of these commands caused no error
however as I plot the graph I got this error
Graph plot command
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(12,6))
ax.plot(datmonth, DAT1980, color='purple', linestyle='--', label='1980')
ax.grid()
plt.legend()
ax.set_title('Monthly River Indus Discharge Comparison 1980-2013')
ax.set_ylabel('Discharge (m3/s)')
ax.set_xlabel('Month')
axs.set_xlim(3, 5)
axs.xaxis.set_major_formatter
fig.autofmt_xdate()
ax.legend(loc='upper left', bbox_to_anchor=(1, 1))
which I got "ValueError: x and y must have same first dimension, but have shapes (12,) and (366, 1)" as the error
I then tried
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(12,6))
ax.plot(DAT.index.month, DAT.index.year==1980, color='purple', linestyle='--', label='1980')
ax.grid()
ax.plot(DAT.index.month, DAT.index.year==1981, color='black', marker='o', linestyle='-', label='C1981')
ax.grid()
plt.legend()
ax.set_title('Monthly River Indus Discharge Comparison 1980-2013')
ax.set_ylabel('Discharge (m3/s)')
ax.set_xlabel('Month')
#axs.set_xlim(1, 12)
axs.xaxis.set_major_formatter
fig.autofmt_xdate()
ax.legend(loc='upper left', bbox_to_anchor=(1, 1))
and it worked better than the previous graph but still not what I wanted (please check out the graph here)
as my intention is to create a graph similar to this
I wholeheartedly appreciate any suggestion you may have! Thank you so so much and if you need any further information please do not hesitate to ask, I will reply as soon as possible.
Upvotes: 2
Views: 1969
Reputation: 8790
Welcome to SO! Nice job creating a clear description of your issue and showing lots of code : )
There are a few syntax issues here and there, but the main issue I see is that you need to add a groupby/aggregation operation at some point. That is, you have daily data, but your desired plot has monthly resolution (for each year). It sounds like you want an average of the daily values for each month for each year (correct me if that is wrong).
Here is some fake data:
dr = pd.date_range('01-01-1980', '12-31-2013', freq='1D')
flow = np.random.rand(len(dr))
df = pd.DataFrame(flow, columns=['flow'], index=dr)
Looks like your example:
flow
1980-01-01 0.751287
1980-01-02 0.411040
1980-01-03 0.134878
1980-01-04 0.692086
1980-01-05 0.671108
...
2013-12-27 0.683654
2013-12-28 0.772894
2013-12-29 0.380631
2013-12-30 0.957220
2013-12-31 0.864612
[12419 rows x 1 columns]
You can use groupby
to get a mean for each month, using the same datetime attributes you use above (with some additional methods to help make the data easier to work with)
monthly = (df.groupby([df.index.year, df.index.month])
.mean()
.rename_axis(index=['year', 'month'],)
.reset_index())
monthly
has flow data for each month for each year, i.e. what you want to plot:
year month flow
0 1980 1 0.514496
1 1980 2 0.633738
2 1980 3 0.566166
3 1980 4 0.553763
4 1980 5 0.537686
.. ... ... ...
403 2013 8 0.402805
404 2013 9 0.479226
405 2013 10 0.446874
406 2013 11 0.526942
407 2013 12 0.599161
[408 rows x 3 columns]
Now to plot an individual year, you index it from monthly
and plot the flow data. I use most of your axes formatting:
# make figure
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(12,6))
# plotting for one year
sub = monthly[monthly['year'] == 1980]
ax.plot(sub['month'], sub['flow'], color='purple', linestyle='--', label='1980')
# some formatting
ax.set_title('Monthly River Indus Discharge Comparison 1980-2013')
ax.set_ylabel('Discharge (m3/s)')
ax.set_xlabel('Month')
ax.set_xticks(range(1, 13))
ax.set_xticklabels(['J','F','M','A','M','J','J','A','S','O','N','D'])
ax.legend()
ax.grid()
Producing the following:
You could instead plot several years using a loop of some sort:
years = [1980, 1981, 1982, ...]
for year in years:
sub = monthly[monthly['year'] == year]
ax.plot(sub['month'], sub['flow'], ...)
You many run into some other challenges here (like finding a way to set nice styling for 30+ lines, and doing so in a loop). You can open a new post (building off of this one) if you can't find out how to accomplish something through other posts here. Best of luck!
Upvotes: 3