Laxmikant
Laxmikant

Reputation: 2216

Categorize df based on column orders

I have to Sort the columns in the order C, D, E, B, A and categorize them in ascending order.

Where as -

df:

        A   B   C      D      E
    0   8   5   0  False   True
    1  45  35   0   True  False
    2  35  10   1  False   True
    3  40   5   2   True  False
    4  12  10   5  False  False
    5  18  15  13  False   True
    6  25  15   5   True  False
    7  35  10  11  False   True
    8  95  50   0  False  False

Conditions to Sort as per the below column order:

Example Condition order:

Max(C), D, E, Max(B) and Max(A) where as C!= 0  and B!=0
Also Column D with `True` and E with `False` should be on the top 

C, D, E, A, Max(B), and Max (A) where as C== 0 and B !=0
Also Column Ds with `True` and Es with `False` on the top 

C, D, E, A, B, and Max(A) where as C== 0 and B ==0
Also Column Ds with `True` and Es with `False` on the top 

Example categorization Order -

 Category 1 -  C, D(True), E(False), B, A and C!=0
 Category 2 -  C, D(True), E(True), B, A and C!=0
 Category 3 -  C, D(False), E(False), B, A and C!=0
 Category 4 -  C, D(False), E(True), B, A and C!=0

Then consider C=0

 Category 5 -  B, D(True), E(False), B, A and B!=0 and C=0
 Category 6 -  B, D(True), E(True), B, A and B!=0 and C=0
 Category 7 -  B, D(False), E(False), B, A and B!=0 and C=0
 Category 8 -  B, D(False), E(True), B, A and B!=0 and C=0

and so on

I tried, by updating ascending orders but not getting expected output:

df.sort_values(['C', 'D','E', 'B', 'A'], ascending=[False, False, False, False, False]) # without category

Here is the expected output: -

   A   B   C      D      E       Category
6  25  15   5   True  False       1    
3  40   5   2   True  False       1
4  12  10   5  False  False       3
5  18  15  13  False   True       4
7  35  10  11  False   True       4
2  35  10   1  False   True       4
1  45  35   0   True  False       5
8  95  50   0  False  False       6
0   8   5   0  False   True       7

Upvotes: 0

Views: 123

Answers (1)

jezrael
jezrael

Reputation: 863291

Use numpy.select for set new conditions by C,D,E columns, so then sorting by Category, A, B columns with default ascending order:

#test not equal 0
m0 = df['C'].ne(0) 

#chained True & False
m1 = df['D'] & ~df['E']
#chained True & True 
m2 = df['D'] & df['E']
#chained False & False
m3 = ~df['D'] & ~df['E']
#chained False & True
m4 = ~df['D'] & df['E']

df['Category'] = np.select([m0 & m1, m0 & m2, m0 & m3, m0 & m4,
                            ~m0 & m1, ~m0 & m2, ~m0 & m3, ~m0 & m4], [1,2,3,4,5,6,7,8])

df = df.sort_values(['Category','A','B']) 
print (df)
    A   B   C      D      E  Category
6  25  15   5   True  False         1
3  40   5   2   True  False         1
4  12  10   5  False  False         3
5  18  15  13  False   True         4
2  35  10   1  False   True         4
7  35  10  11  False   True         4
1  45  35   0   True  False         5
8  95  50   0  False  False         7
0   8   5   0  False   True         8

Upvotes: 1

Related Questions