Reputation: 34086
This is an extension to my previous question
:
Consider below df
:
In [68]: df = pd.DataFrame({'A': ['a'] * 11,
...: 'B': ['b'] * 11,
...: 'C': ['C1', 'C1', 'C2','C1', 'C3', 'C3', 'C2', 'C3', 'C3', 'C2', 'C2'],
...: 'D': ['D1', 'D2', 'D1', 'D3', 'D3', 'D2', 'D4', 'D4', 'D1', 'D2', 'D3'],
...: 'E': [{'value': '4', 'percentage': None}, {'value': 5, 'percentage': None}, {'value': 12, 'percentage': None}, {'value': 9, 'percentage': None}, {'value': '12', 'percentage': None}, {'value': 'N/A', 'percentage': None}, {}, {'valu
...: e': 24, 'percentage': None}, {'value': 12, 'percentage': None}, {'value': 33, 'percentage': None}, {'value': 11, 'percentage': None}]})
...:
Pivot
of above df:
In [69]: x = df.pivot(['B', 'C', 'D'], 'A', ['E'])
In [70]: x
Out[70]:
E
A a
B C D
b C1 D1 {'value': '4', 'percentage': None}
D2 {'value': 5, 'percentage': None}
D3 {'value': 9, 'percentage': None}
C2 D1 {'value': 12, 'percentage': None}
D2 {'value': 33, 'percentage': None}
D3 {'value': 11, 'percentage': None}
D4 {}
C3 D1 {'value': 12, 'percentage': None}
D2 {'value': 'N/A', 'percentage': None}
D3 {'value': '12', 'percentage': None}
D4 {'value': 24, 'percentage': None}
I want to sort the innermost column which is D
for each group of outer columns B and C
based on the multi-level column with index (E, a)
in asc/desc
order based on value
key from dict.
But, for every group there would be a row with the total
value of all other rows. I always need to keep that row at the last irrespective of sorting order(asc or desc).
Expected output in case of desc
:
Out[70]:
E
A a
B C D
b C1 D2 {'value': 5, 'percentage': None}
D1 {'value': '4', 'percentage': None}
D3 {'value': 9, 'percentage': None}
C2 D1 {'value': 12, 'percentage': None}
D3 {'value': 11, 'percentage': None}
D4 {}
D2 {'value': 33, 'percentage': None}
C3 D1 {'value': 12, 'percentage': None}
D3 {'value': '12', 'percentage': None}
D2 {'value': 'N/A', 'percentage': None}
D4 {'value': 24, 'percentage': None}
Expected output in case of asc
:
Out[70]:
E
A a
B C D
b C1 D1 {'value': '4', 'percentage': None}
D2 {'value': 5, 'percentage': None}
D3 {'value': 9, 'percentage': None}
C2 D3 {'value': 11, 'percentage': None}
D1 {'value': 12, 'percentage': None}
D4 {}
D2 {'value': 33, 'percentage': None}
C3 D1 {'value': 12, 'percentage': None}
D3 {'value': '12', 'percentage': None}
D2 {'value': 'N/A', 'percentage': None}
D4 {'value': 24, 'percentage': None}
Upvotes: 2
Views: 90
Reputation: 863236
Solution with helper columns used for sorting - first convert values
to numeric columns by Series.str.get
and to_numeric
and then create another column compared max value per groups for boolean:
lvls = list(x.index.names[:-1])
print (lvls)
['B', 'C']
x[('tmp', 'tmp')] = pd.to_numeric(x[('E','a')].str.get('value'), errors='coerce')
x[('max','tmp')] = x.groupby(lvls)[[('tmp','tmp')]].transform('max') == x[[('tmp','tmp')]]
All values in ascending parameter are True
, default value:
x1 = x.sort_values(lvls + [('max','tmp'), ('tmp','tmp')])
print (x1)
E tmp max
A a tmp tmp
B C D
b C1 D1 {'value': '4', 'percentage': None} 4.0 False
D2 {'value': 5, 'percentage': None} 5.0 False
D3 {'value': 9, 'percentage': None} 9.0 True
C2 D3 {'value': 11, 'percentage': None} 11.0 False
D1 {'value': 12, 'percentage': None} 12.0 False
D4 {} NaN False
D2 {'value': 33, 'percentage': None} 33.0 True
C3 D1 {'value': 12, 'percentage': None} 12.0 False
D3 {'value': '12', 'percentage': None} 12.0 False
D2 {'value': 'N/A', 'percentage': None} NaN False
D4 {'value': 24, 'percentage': None} 24.0 True
Here is changed last True
to False
:
x2 = x.sort_values(lvls + [('max','tmp'), ('tmp','tmp')],
ascending=[True] * len(lvls) + [True, False])
print (x2)
E tmp max
A a tmp tmp
B C D
b C1 D2 {'value': 5, 'percentage': None} 5.0 False
D1 {'value': '4', 'percentage': None} 4.0 False
D3 {'value': 9, 'percentage': None} 9.0 True
C2 D1 {'value': 12, 'percentage': None} 12.0 False
D3 {'value': 11, 'percentage': None} 11.0 False
D4 {} NaN False
D2 {'value': 33, 'percentage': None} 33.0 True
C3 D1 {'value': 12, 'percentage': None} 12.0 False
D3 {'value': '12', 'percentage': None} 12.0 False
D2 {'value': 'N/A', 'percentage': None} NaN False
D4 {'value': 24, 'percentage': None} 24.0 True
Last remove helper columns:
x1 = x1.drop([('max','tmp'), ('tmp','tmp')], axis=1)
x2 = x2.drop([('max','tmp'), ('tmp','tmp')], axis=1)
Upvotes: 2
Reputation: 71687
You can define a function that groups
the multilevel column E, a
on levels B and C
and returns the indices that would sort the dataframe as specified by the rule where the row with the total value of all other rows is kept at last:
def sort_idx(s):
idx = []
for k, g in s.groupby(level=[0, 1], sort=False):
i = g.idxmax()
idx += [*g.drop(i).sort_values().index , i]
return idx
s = pd.to_numeric(x[('E', 'a')].str['value'], errors='coerce')
x = x.loc[sort_idx(s)]
Result:
E
A a
B C D
b C1 D1 {'value': '4', 'percentage': None}
D2 {'value': 5, 'percentage': None}
D3 {'value': 9, 'percentage': None}
C2 D3 {'value': 11, 'percentage': None}
D1 {'value': 12, 'percentage': None}
D4 {}
D2 {'value': 33, 'percentage': None}
C3 D1 {'value': 12, 'percentage': None}
D3 {'value': '12', 'percentage': None}
D2 {'value': 'N/A', 'percentage': None}
D4 {'value': 24, 'percentage': None}
Upvotes: 2