Reputation: 157
I have a pandas DataFrame of the general form (let's call it df
)
A C D E F
0 0.4 0.1 0.2 0 0.7
1 0 0.2 0 0.5 0.6
2 0.1 0.5 0 0 0
3 0 0 0 0.8 0
4 0.8 0.1 0 0 0
5 0 0.9 0 0 0
Though each axis will be longer (21 columns, varying index length).
I need to find the largest number in the entire DataFrame, and extract the index and column name, then the next largest (if not in same index) and again find index and column name, and so on.
I can of course extract each index as a series (pd.Index.to_series
) or list, and do it item-wise. But I'm thinking there's a more efficient way using the dataframes (since I cannot predict the maximum index size this will eventually be used on), that I just can't seem to find/think of.
I've found plenty of answer using df.max()
or df.idmax()
, to get the maximum, if I only needed that.
Upvotes: 2
Views: 254
Reputation: 6327
Use the following:
df.stack().reset_index().sort_values(by=0, ascending=False)
Out[168]:
level_0 level_1 0
26 5 C 0.9
18 3 E 0.8
20 4 A 0.8
4 0 F 0.7
9 1 F 0.6
8 1 E 0.5
11 2 C 0.5
0 0 A 0.4
6 1 C 0.2
2 0 D 0.2
10 2 A 0.1
1 0 C 0.1
21 4 C 0.1
27 5 D 0.0
28 5 E 0.0
25 5 A 0.0
24 4 F 0.0
23 4 E 0.0
22 4 D 0.0
15 3 A 0.0
19 3 F 0.0
17 3 D 0.0
16 3 C 0.0
14 2 F 0.0
13 2 E 0.0
12 2 D 0.0
7 1 D 0.0
5 1 A 0.0
3 0 E 0.0
29 5 F 0.0
Upvotes: 3
Reputation: 195438
Try:
print(
df.reset_index()
.melt("index")
.sort_values(by="value", ascending=False)[["index", "variable"]]
)
Prints:
index variable
11 5 C
21 3 E
4 4 A
24 0 F
25 1 F
8 2 C
19 1 E
0 0 A
7 1 C
12 0 D
2 2 A
6 0 C
10 4 C
23 5 E
22 4 E
26 2 F
27 3 F
28 4 F
20 2 E
15 3 D
18 0 E
17 5 D
16 4 D
1 1 A
14 2 D
13 1 D
9 3 C
5 5 A
3 3 A
29 5 F
Upvotes: 2