plot the sorted weekdays/month on timeseries dataframe in python

Question

I have an one year of traffic data stored in a data frame.

study time	volume	month	day	year	weekday	week_of_year
2019-01-01 00:00:00	25	January	Tuesday	2019	1	1
2019-01-01 00:00:15	25	January	Tuesday	2019	1	1
2019-01-01 00:00:30	21	January	Tuesday	2019	1	1
2019-01-02 00:00:00	100	January	Wednesday	2019	2	1
2019-01-02 00:00:15	2	January	Wednesday	2019	2	1
2019-01-02 00:00:30	50	January	Wednesday	2019	2	1

I want to see the hourly, daily, weekly and monthly patterns on volume data. I did so using this script:

fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(16,10))
plt.axes(ax[0,0])

countData19_gdf.groupby(['hour','address']).mean().groupby(['hour'])['volume'].mean().plot(x='hour',y='volume')
plt.ylabel("Total averge counts of the stations")

plt.axes(ax[0,1])
countData19_gdf.groupby(['day','address']).mean().groupby(['day'])['volume'].mean().plot(x='day',y='volume')

plt.axes(ax[1,0])
countData19_gdf.groupby(['week_of_year','address']).mean().groupby(['week_of_year'])['volume'].mean().plot(x='week_of_year',y='volume', rot=90)
plt.ylabel("Total averge counts of the stations")

plt.axes(ax[1,1])
countData19_gdf.groupby(['month','address']).mean().groupby(['month'])['volume'].mean().plot(x='month',y='volume', rot=90)
plt.ylabel("Total averge counts of the stations")

ax[0,0].title.set_text('Hourly')
ax[0,1].title.set_text('Daily')
ax[1,0].title.set_text('Weekly')
ax[1,1].title.set_text('Monthly')

plt.savefig('temporal_global.png')

and the result looks like this, in which the weekdays is or months are not sorted.

Can you please help me with how I can sort them? I tried to sort days as integers but it does not work.

dm2 · Accepted Answer

The groupby method will automatically sort the index, however for string values, that means sorting alphabetically (and not by, for example, order of weekdays).

What you can do is use reindex method to have the index order how you would like it. For example:

countData19_gdf.groupby(['day','address']).mean().groupby(['day'])['volume'].mean().reindex(['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']).plot(x='day',y='volume')

Note:

If value in index is not present in the list of values specified in reindex method, that row will not be included. Likewise, if there's a new value in that list, which is not present in the index, it will result in a NaN value assigned to that new index. So, if your countData19_gdf doesn't have day such as Monday, it will be present in the reindexed df, but the value will be set to NaN.

Edit:

Since you already have numerical values for weekday (you might want to get the same for months), to avoid specifying the new index by hand, you could get sorted string values via:

countData19_gdf.sort_values(by = 'weekday')['day'].unique()

Quick example (I changed around some 'day' values in the given data to display the issue):

df.groupby(['day','address']).mean().groupby(['day'])['volume'].mean().plot(x='day',y='volume')

Outputs:

df.groupby(['day','address']).mean().groupby(['day'])['volume'].mean().reindex(['Tuesday','Wednesday','Friday']).plot(x='day',y='volume')

Outputs:

plot the sorted weekdays/month on timeseries dataframe in python

Answers (1)

Related Questions