GeoBeez
GeoBeez

Reputation: 1014

plot the sorted weekdays/month on timeseries dataframe in python

I have an one year of traffic data stored in a data frame.

study time volume month hour day year weekday week_of_year weekend
2019-01-01 00:00:00 25 January 0 Tuesday 2019 1 1 0
2019-01-01 00:00:15 25 January 0 Tuesday 2019 1 1 0
2019-01-01 00:00:30 21 January 0 Tuesday 2019 1 1 0
2019-01-02 00:00:00 100 January 0 Wednesday 2019 2 1 0
2019-01-02 00:00:15 2 January 0 Wednesday 2019 2 1 0
2019-01-02 00:00:30 50 January 0 Wednesday 2019 2 1 0

I want to see the hourly, daily, weekly and monthly patterns on volume data. I did so using this script:

fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(16,10))
plt.axes(ax[0,0])

countData19_gdf.groupby(['hour','address']).mean().groupby(['hour'])['volume'].mean().plot(x='hour',y='volume')
plt.ylabel("Total averge counts of the stations")

plt.axes(ax[0,1])
countData19_gdf.groupby(['day','address']).mean().groupby(['day'])['volume'].mean().plot(x='day',y='volume')

plt.axes(ax[1,0])
countData19_gdf.groupby(['week_of_year','address']).mean().groupby(['week_of_year'])['volume'].mean().plot(x='week_of_year',y='volume', rot=90)
plt.ylabel("Total averge counts of the stations")

plt.axes(ax[1,1])
countData19_gdf.groupby(['month','address']).mean().groupby(['month'])['volume'].mean().plot(x='month',y='volume', rot=90)
plt.ylabel("Total averge counts of the stations")

ax[0,0].title.set_text('Hourly')
ax[0,1].title.set_text('Daily')
ax[1,0].title.set_text('Weekly')
ax[1,1].title.set_text('Monthly')

plt.savefig('temporal_global.png')

and the result looks like this, in which the weekdays is or months are not sorted.

enter image description here

Can you please help me with how I can sort them? I tried to sort days as integers but it does not work.

Upvotes: 0

Views: 802

Answers (1)

dm2
dm2

Reputation: 4275

The groupby method will automatically sort the index, however for string values, that means sorting alphabetically (and not by, for example, order of weekdays).

What you can do is use reindex method to have the index order how you would like it. For example:

countData19_gdf.groupby(['day','address']).mean().groupby(['day'])['volume'].mean().reindex(['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']).plot(x='day',y='volume')

Note:

If value in index is not present in the list of values specified in reindex method, that row will not be included. Likewise, if there's a new value in that list, which is not present in the index, it will result in a NaN value assigned to that new index. So, if your countData19_gdf doesn't have day such as Monday, it will be present in the reindexed df, but the value will be set to NaN.

Edit:

Since you already have numerical values for weekday (you might want to get the same for months), to avoid specifying the new index by hand, you could get sorted string values via:

countData19_gdf.sort_values(by = 'weekday')['day'].unique()

Quick example (I changed around some 'day' values in the given data to display the issue):

df.groupby(['day','address']).mean().groupby(['day'])['volume'].mean().plot(x='day',y='volume')

Outputs:

enter image description here

df.groupby(['day','address']).mean().groupby(['day'])['volume'].mean().reindex(['Tuesday','Wednesday','Friday']).plot(x='day',y='volume')

Outputs:

enter image description here

Upvotes: 3

Related Questions