Nikhil
Nikhil

Reputation: 81

How to sort weekdays in pivot dataframe?

enter image description here

Hi,

I am trying to sort the weekdays in the above pivot table data frame. Initially, I used the below code to sort weekdays column:

df4 = df3.sort_values(by = "day_of_week")

But I got a result as below:

enter image description here

Then I used 'Ordered Categorical' for sorting. But By using the Ordered Categorical I got a key error:'day_of_week'

Ordered Categorical code:

c_s = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
df3['day_of_week'] = pd.Categorical(df3['day_of_week'], categories=c_s, ordered=True)
df4 = df3.sort_values(by= "day_of_week")

Could you please suggest any function? so, that days can be sorted in the correct order without removing any day.

dataframe:

,,,mean,amin,amax,median,count_nonzero
,,,ride_length_min,ride_length_min,ride_length_min,ride_length_min,ride_id
member_casual,rideable_type,day_of_week,,,,,
casual,docked_bike,Friday,48.62283921068028,0.0,24703.0,22.0,18801
casual,docked_bike,Monday,52.296237045280655,0.0,37127.0,24.0,14377
casual,docked_bike,Saturday,50.72260119602127,0.0,36082.0,25.0,33277
casual,docked_bike,Sunday,55.521421616358325,0.0,32521.0,27.0,32864
casual,docked_bike,Thursday,59.29000334656021,0.0,38537.0,24.0,20917
casual,docked_bike,Tuesday,45.6941467880028,0.0,12181.0,24.0,18571
casual,docked_bike,Wednesday,45.113569228835395,0.0,9936.0,23.0,15911
member,docked_bike,Friday,17.73414485696896,0.0,1548.0,13.0,24645
member,docked_bike,Monday,17.079518285663685,0.0,1500.0,13.0,22586
member,docked_bike,Saturday,20.22856960896701,0.0,9922.0,16.0,31404
member,docked_bike,Sunday,20.08236931950015,0.0,1500.0,16.0,30169
member,docked_bike,Thursday,18.38059068748675,0.0,1500.0,14.0,28306
member,docked_bike,Tuesday,17.577247344793605,0.0,1500.0,14.0,27399
member,docked_bike,Wednesday,19.15993775759105,0.0,41271.0,14.0,23778

Thanks, in advance

Upvotes: 2

Views: 588

Answers (1)

Henry Ecker
Henry Ecker

Reputation: 35676

Note: this process is much more difficult after pivot. However, we can setup the Categorical for days of the week as suggested by Sorting pandas dataframe by weekdays, and change the index level to a new CategoricalDtype pandas: convert index type in multiindex dataframe

# Setup Categorical Dtype
c_s = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday',
       'Sunday']
cat_dtype = pd.CategoricalDtype(categories=c_s, ordered=True)
# Update Index Level Type
df3.index = df3.index.set_levels(
    df3.index.levels[-1].astype(cat_dtype), level=-1
)
# Sort values
df4 = df3.sort_values('day_of_week')

df4:

                                                   mean            amin            amax          median count_nonzero
                                        ride_length_min ride_length_min ride_length_min ride_length_min       ride_id
member_casual rideable_type day_of_week                                                                              
casual        docked_bike   Monday            52.296237             0.0         37127.0            24.0         14377
member        docked_bike   Monday            17.079518             0.0          1500.0            13.0         22586
casual        docked_bike   Tuesday           45.694147             0.0         12181.0            24.0         18571
member        docked_bike   Tuesday           17.577247             0.0          1500.0            14.0         27399
casual        docked_bike   Wednesday         45.113569             0.0          9936.0            23.0         15911
member        docked_bike   Wednesday         19.159938             0.0         41271.0            14.0         23778
casual        docked_bike   Thursday          59.290003             0.0         38537.0            24.0         20917
member        docked_bike   Thursday          18.380591             0.0          1500.0            14.0         28306
casual        docked_bike   Friday            48.622839             0.0         24703.0            22.0         18801
member        docked_bike   Friday            17.734145             0.0          1548.0            13.0         24645
casual        docked_bike   Saturday          50.722601             0.0         36082.0            25.0         33277
member        docked_bike   Saturday          20.228570             0.0          9922.0            16.0         31404
casual        docked_bike   Sunday            55.521422             0.0         32521.0            27.0         32864
member        docked_bike   Sunday            20.082369             0.0          1500.0            16.0         30169

Notice setting the CategoricalDtype before pivot is much easier as there is more support for changing a column dtype than a specific level of a MultiIndex dtype:

import pandas as pd

# Some Small Sample Data
df3 = pd.DataFrame({'member_casual': ['casual', 'member', 'member'],
                    'rideable_type': 'docked_bike',
                    'day_of_week': ['Wednesday', 'Tuesday', 'Monday'],
                    'a': 'mean',
                    'b': 'ride_length_min',
                    'c': 120})
# Setup Categorical Dtype
c_s = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday',
       'Sunday']
# Change ColumnDtype
df3['day_of_week'] = pd.Categorical(df3['day_of_week'],
                                    categories=c_s,
                                    ordered=True)
# Pivot and Sort
df4 = (
    df3.pivot(index=['member_casual', 'rideable_type', 'day_of_week'],
              columns=['a', 'b'],
              values='c')
        .sort_values('day_of_week')
        .rename_axis(columns=[None, None])
)
print(df4)

Sample df4:

                                                   mean
                                        ride_length_min
member_casual rideable_type day_of_week                
member        docked_bike   Monday                  120
                            Tuesday                 120
casual        docked_bike   Wednesday               120

DataFrame setup and imports:

import pandas as pd

df3 = pd.DataFrame([
    ['casual', 'docked_bike', 'Friday', 48.62283921068028, 0.0, 24703.0, 22.0,
     18801],
    ['casual', 'docked_bike', 'Monday', 52.296237045280655, 0.0, 37127.0, 24.0,
     14377],
    ['casual', 'docked_bike', 'Saturday', 50.72260119602127, 0.0, 36082.0, 25.0,
     33277],
    ['casual', 'docked_bike', 'Sunday', 55.521421616358325, 0.0, 32521.0, 27.0,
     32864],
    ['casual', 'docked_bike', 'Thursday', 59.29000334656021, 0.0, 38537.0, 24.0,
     20917],
    ['casual', 'docked_bike', 'Tuesday', 45.6941467880028, 0.0, 12181.0, 24.0,
     18571],
    ['casual', 'docked_bike', 'Wednesday', 45.1135692288354, 0.0, 9936.0, 23.0,
     15911],
    ['member', 'docked_bike', 'Friday', 17.73414485696896, 0.0, 1548.0, 13.0,
     24645],
    ['member', 'docked_bike', 'Monday', 17.079518285663685, 0.0, 1500.0, 13.0,
     22586],
    ['member', 'docked_bike', 'Saturday', 20.22856960896701, 0.0, 9922.0, 16.0,
     31404],
    ['member', 'docked_bike', 'Sunday', 20.08236931950015, 0.0, 1500.0, 16.0,
     30169],
    ['member', 'docked_bike', 'Thursday', 18.38059068748675, 0.0, 1500.0, 14.0,
     28306],
    ['member', 'docked_bike', 'Tuesday', 17.577247344793605, 0.0, 1500.0, 14.0,
     27399],
    ['member', 'docked_bike', 'Wednesday', 19.15993775759105, 0.0, 41271.0,
     14.0, 23778]
]).set_index([0, 1, 2])
df3.index.names = ['member_casual', 'rideable_type', 'day_of_week']
df3.columns = pd.MultiIndex.from_arrays([
    ['mean', 'amin', 'amax', 'median', 'count_nonzero'],
    ['ride_length_min', 'ride_length_min', 'ride_length_min', 'ride_length_min',
     'ride_id']
])

Upvotes: 2

Related Questions