Reputation: 1014
I have an one year of traffic data stored in a data frame.
study time | volume | month | hour | day | year | weekday | week_of_year | weekend |
---|---|---|---|---|---|---|---|---|
2019-01-01 00:00:00 | 25 | January | 0 | Tuesday | 2019 | 1 | 1 | 0 |
2019-01-01 00:00:15 | 25 | January | 0 | Tuesday | 2019 | 1 | 1 | 0 |
2019-01-01 00:00:30 | 21 | January | 0 | Tuesday | 2019 | 1 | 1 | 0 |
2019-01-02 00:00:00 | 100 | January | 0 | Wednesday | 2019 | 2 | 1 | 0 |
2019-01-02 00:00:15 | 2 | January | 0 | Wednesday | 2019 | 2 | 1 | 0 |
2019-01-02 00:00:30 | 50 | January | 0 | Wednesday | 2019 | 2 | 1 | 0 |
I want to see the hourly, daily, weekly and monthly patterns on volume data. I did so using this script:
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(16,10))
plt.axes(ax[0,0])
countData19_gdf.groupby(['hour','address']).mean().groupby(['hour'])['volume'].mean().plot(x='hour',y='volume')
plt.ylabel("Total averge counts of the stations")
plt.axes(ax[0,1])
countData19_gdf.groupby(['day','address']).mean().groupby(['day'])['volume'].mean().plot(x='day',y='volume')
plt.axes(ax[1,0])
countData19_gdf.groupby(['week_of_year','address']).mean().groupby(['week_of_year'])['volume'].mean().plot(x='week_of_year',y='volume', rot=90)
plt.ylabel("Total averge counts of the stations")
plt.axes(ax[1,1])
countData19_gdf.groupby(['month','address']).mean().groupby(['month'])['volume'].mean().plot(x='month',y='volume', rot=90)
plt.ylabel("Total averge counts of the stations")
ax[0,0].title.set_text('Hourly')
ax[0,1].title.set_text('Daily')
ax[1,0].title.set_text('Weekly')
ax[1,1].title.set_text('Monthly')
plt.savefig('temporal_global.png')
and the result looks like this, in which the weekdays is or months are not sorted.
Can you please help me with how I can sort them? I tried to sort days as integers but it does not work.
Upvotes: 0
Views: 802
Reputation: 4275
The groupby method will automatically sort the index, however for string values, that means sorting alphabetically (and not by, for example, order of weekdays).
What you can do is use reindex
method to have the index order how you would like it. For example:
countData19_gdf.groupby(['day','address']).mean().groupby(['day'])['volume'].mean().reindex(['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']).plot(x='day',y='volume')
Note:
If value in index is not present in the list of values specified in reindex
method, that row will not be included. Likewise, if there's a new value in that list, which is not present in the index, it will result in a NaN
value assigned to that new index. So, if your countData19_gdf
doesn't have day
such as Monday
, it will be present in the reindexed df
, but the value will be set to NaN
.
Edit:
Since you already have numerical values for weekday (you might want to get the same for months), to avoid specifying the new index by hand, you could get sorted string values via:
countData19_gdf.sort_values(by = 'weekday')['day'].unique()
Quick example (I changed around some 'day' values in the given data to display the issue):
df.groupby(['day','address']).mean().groupby(['day'])['volume'].mean().plot(x='day',y='volume')
Outputs:
df.groupby(['day','address']).mean().groupby(['day'])['volume'].mean().reindex(['Tuesday','Wednesday','Friday']).plot(x='day',y='volume')
Outputs:
Upvotes: 3