Reputation: 81
Hi,
I am trying to sort the weekdays in the above pivot table data frame. Initially, I used the below code to sort weekdays column:
df4 = df3.sort_values(by = "day_of_week")
But I got a result as below:
Then I used 'Ordered Categorical' for sorting. But By using the Ordered Categorical I got a key error:'day_of_week'
Ordered Categorical code:
c_s = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
df3['day_of_week'] = pd.Categorical(df3['day_of_week'], categories=c_s, ordered=True)
df4 = df3.sort_values(by= "day_of_week")
Could you please suggest any function? so, that days can be sorted in the correct order without removing any day.
dataframe:
,,,mean,amin,amax,median,count_nonzero
,,,ride_length_min,ride_length_min,ride_length_min,ride_length_min,ride_id
member_casual,rideable_type,day_of_week,,,,,
casual,docked_bike,Friday,48.62283921068028,0.0,24703.0,22.0,18801
casual,docked_bike,Monday,52.296237045280655,0.0,37127.0,24.0,14377
casual,docked_bike,Saturday,50.72260119602127,0.0,36082.0,25.0,33277
casual,docked_bike,Sunday,55.521421616358325,0.0,32521.0,27.0,32864
casual,docked_bike,Thursday,59.29000334656021,0.0,38537.0,24.0,20917
casual,docked_bike,Tuesday,45.6941467880028,0.0,12181.0,24.0,18571
casual,docked_bike,Wednesday,45.113569228835395,0.0,9936.0,23.0,15911
member,docked_bike,Friday,17.73414485696896,0.0,1548.0,13.0,24645
member,docked_bike,Monday,17.079518285663685,0.0,1500.0,13.0,22586
member,docked_bike,Saturday,20.22856960896701,0.0,9922.0,16.0,31404
member,docked_bike,Sunday,20.08236931950015,0.0,1500.0,16.0,30169
member,docked_bike,Thursday,18.38059068748675,0.0,1500.0,14.0,28306
member,docked_bike,Tuesday,17.577247344793605,0.0,1500.0,14.0,27399
member,docked_bike,Wednesday,19.15993775759105,0.0,41271.0,14.0,23778
Thanks, in advance
Upvotes: 2
Views: 588
Reputation: 35676
Note: this process is much more difficult after pivot
. However, we can setup the Categorical for days of the week as suggested by Sorting pandas dataframe by weekdays, and change the index level to a new CategoricalDtype pandas: convert index type in multiindex dataframe
# Setup Categorical Dtype
c_s = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday',
'Sunday']
cat_dtype = pd.CategoricalDtype(categories=c_s, ordered=True)
# Update Index Level Type
df3.index = df3.index.set_levels(
df3.index.levels[-1].astype(cat_dtype), level=-1
)
# Sort values
df4 = df3.sort_values('day_of_week')
df4
:
mean amin amax median count_nonzero
ride_length_min ride_length_min ride_length_min ride_length_min ride_id
member_casual rideable_type day_of_week
casual docked_bike Monday 52.296237 0.0 37127.0 24.0 14377
member docked_bike Monday 17.079518 0.0 1500.0 13.0 22586
casual docked_bike Tuesday 45.694147 0.0 12181.0 24.0 18571
member docked_bike Tuesday 17.577247 0.0 1500.0 14.0 27399
casual docked_bike Wednesday 45.113569 0.0 9936.0 23.0 15911
member docked_bike Wednesday 19.159938 0.0 41271.0 14.0 23778
casual docked_bike Thursday 59.290003 0.0 38537.0 24.0 20917
member docked_bike Thursday 18.380591 0.0 1500.0 14.0 28306
casual docked_bike Friday 48.622839 0.0 24703.0 22.0 18801
member docked_bike Friday 17.734145 0.0 1548.0 13.0 24645
casual docked_bike Saturday 50.722601 0.0 36082.0 25.0 33277
member docked_bike Saturday 20.228570 0.0 9922.0 16.0 31404
casual docked_bike Sunday 55.521422 0.0 32521.0 27.0 32864
member docked_bike Sunday 20.082369 0.0 1500.0 16.0 30169
Notice setting the CategoricalDtype
before pivot is much easier as there is more support for changing a column dtype than a specific level of a MultiIndex dtype:
import pandas as pd
# Some Small Sample Data
df3 = pd.DataFrame({'member_casual': ['casual', 'member', 'member'],
'rideable_type': 'docked_bike',
'day_of_week': ['Wednesday', 'Tuesday', 'Monday'],
'a': 'mean',
'b': 'ride_length_min',
'c': 120})
# Setup Categorical Dtype
c_s = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday',
'Sunday']
# Change ColumnDtype
df3['day_of_week'] = pd.Categorical(df3['day_of_week'],
categories=c_s,
ordered=True)
# Pivot and Sort
df4 = (
df3.pivot(index=['member_casual', 'rideable_type', 'day_of_week'],
columns=['a', 'b'],
values='c')
.sort_values('day_of_week')
.rename_axis(columns=[None, None])
)
print(df4)
Sample df4
:
mean
ride_length_min
member_casual rideable_type day_of_week
member docked_bike Monday 120
Tuesday 120
casual docked_bike Wednesday 120
DataFrame setup and imports:
import pandas as pd
df3 = pd.DataFrame([
['casual', 'docked_bike', 'Friday', 48.62283921068028, 0.0, 24703.0, 22.0,
18801],
['casual', 'docked_bike', 'Monday', 52.296237045280655, 0.0, 37127.0, 24.0,
14377],
['casual', 'docked_bike', 'Saturday', 50.72260119602127, 0.0, 36082.0, 25.0,
33277],
['casual', 'docked_bike', 'Sunday', 55.521421616358325, 0.0, 32521.0, 27.0,
32864],
['casual', 'docked_bike', 'Thursday', 59.29000334656021, 0.0, 38537.0, 24.0,
20917],
['casual', 'docked_bike', 'Tuesday', 45.6941467880028, 0.0, 12181.0, 24.0,
18571],
['casual', 'docked_bike', 'Wednesday', 45.1135692288354, 0.0, 9936.0, 23.0,
15911],
['member', 'docked_bike', 'Friday', 17.73414485696896, 0.0, 1548.0, 13.0,
24645],
['member', 'docked_bike', 'Monday', 17.079518285663685, 0.0, 1500.0, 13.0,
22586],
['member', 'docked_bike', 'Saturday', 20.22856960896701, 0.0, 9922.0, 16.0,
31404],
['member', 'docked_bike', 'Sunday', 20.08236931950015, 0.0, 1500.0, 16.0,
30169],
['member', 'docked_bike', 'Thursday', 18.38059068748675, 0.0, 1500.0, 14.0,
28306],
['member', 'docked_bike', 'Tuesday', 17.577247344793605, 0.0, 1500.0, 14.0,
27399],
['member', 'docked_bike', 'Wednesday', 19.15993775759105, 0.0, 41271.0,
14.0, 23778]
]).set_index([0, 1, 2])
df3.index.names = ['member_casual', 'rideable_type', 'day_of_week']
df3.columns = pd.MultiIndex.from_arrays([
['mean', 'amin', 'amax', 'median', 'count_nonzero'],
['ride_length_min', 'ride_length_min', 'ride_length_min', 'ride_length_min',
'ride_id']
])
Upvotes: 2