Aephir
Aephir

Reputation: 157

How do I sequentially get the index and column of the highest, next highest, etc. number in a pandas dataframe?

I have a pandas DataFrame of the general form (let's call it df)

    A       C       D       E       F
0   0.4     0.1     0.2     0       0.7
1   0       0.2     0       0.5     0.6
2   0.1     0.5     0       0       0
3   0       0       0       0.8     0
4   0.8     0.1     0       0       0
5   0       0.9     0       0       0

Though each axis will be longer (21 columns, varying index length).

I need to find the largest number in the entire DataFrame, and extract the index and column name, then the next largest (if not in same index) and again find index and column name, and so on.

I can of course extract each index as a series (pd.Index.to_series) or list, and do it item-wise. But I'm thinking there's a more efficient way using the dataframes (since I cannot predict the maximum index size this will eventually be used on), that I just can't seem to find/think of.

I've found plenty of answer using df.max() or df.idmax(), to get the maximum, if I only needed that.

Upvotes: 2

Views: 254

Answers (2)

Tom McLean
Tom McLean

Reputation: 6327

Use the following:

df.stack().reset_index().sort_values(by=0, ascending=False)
Out[168]: 
    level_0 level_1    0
26        5       C  0.9
18        3       E  0.8
20        4       A  0.8
4         0       F  0.7
9         1       F  0.6
8         1       E  0.5
11        2       C  0.5
0         0       A  0.4
6         1       C  0.2
2         0       D  0.2
10        2       A  0.1
1         0       C  0.1
21        4       C  0.1
27        5       D  0.0
28        5       E  0.0
25        5       A  0.0
24        4       F  0.0
23        4       E  0.0
22        4       D  0.0
15        3       A  0.0
19        3       F  0.0
17        3       D  0.0
16        3       C  0.0
14        2       F  0.0
13        2       E  0.0
12        2       D  0.0
7         1       D  0.0
5         1       A  0.0
3         0       E  0.0
29        5       F  0.0

Upvotes: 3

Andrej Kesely
Andrej Kesely

Reputation: 195438

Try:

print(
    df.reset_index()
    .melt("index")
    .sort_values(by="value", ascending=False)[["index", "variable"]]
)

Prints:

    index variable
11      5        C
21      3        E
4       4        A
24      0        F
25      1        F
8       2        C
19      1        E
0       0        A
7       1        C
12      0        D
2       2        A
6       0        C
10      4        C
23      5        E
22      4        E
26      2        F
27      3        F
28      4        F
20      2        E
15      3        D
18      0        E
17      5        D
16      4        D
1       1        A
14      2        D
13      1        D
9       3        C
5       5        A
3       3        A
29      5        F

Upvotes: 2

Related Questions