LizzAlice
LizzAlice

Reputation: 720

Column name and index of max value

I currently have a pandas dataframe where values between 0 and 1 are saved. I am looking for a function which can provide me the top 5 values of a column, together with the name of the column and the associated index of the values.

Sample Input: data frame with column names a:z, index 1:23, entries are values between 0 and 1

Sample Output: array of 5 highest entries in each column, each with column name and index

Edit: For the following data frame:

    np.random.seed([3,1415])
    df = pd.DataFrame(np.random.randint(10, size=(10, 4)),    list('abcdefghij'), list('ABCD'))

df

      A  B  C  D
   a  0  2  7  3
   b  8  7  0  6
   c  8  6  0  2
   d  0  4  9  7
   e  3  2  4  3
   f  3  6  7  7
   g  4  5  3  7
   h  5  9  8  7
   i  6  4  7  6
   j  2  6  6  5

I would like to get an output like (for example for the first column):

 [[8,b,A], [8, c, A], [6,i,A], [5, h, A], [4,g,A]].

Upvotes: 2

Views: 93

Answers (2)

piRSquared
piRSquared

Reputation: 294526

consider the dataframe df

np.random.seed([3,1415])
df = pd.DataFrame(
    np.random.randint(10, size=(10, 4)), list('abcdefghij'), list('ABCD'))

df

   A  B  C  D
a  0  2  7  3
b  8  7  0  6
c  8  6  0  2
d  0  4  9  7
e  3  2  4  3
f  3  6  7  7
g  4  5  3  7
h  5  9  8  7
i  6  4  7  6
j  2  6  6  5

I'm going to use np.argpartition to separate each column into the 5 smallest and 10 - 5 (also 5) largest

v = df.values
i = df.index.values

k = len(v) - 5
pd.DataFrame(
    i[v.argpartition(k, 0)[-k:]],
    np.arange(k), df.columns
)

   A  B  C  D
0  g  f  i  i
1  b  c  a  d
2  h  h  f  h
3  i  b  d  f
4  c  j  h  g

Upvotes: 2

renzop
renzop

Reputation: 1316

print(your_dataframe.sort_values(ascending=False)[0:4])

Upvotes: 0

Related Questions