Alex McLean
Alex McLean

Reputation: 2764

Return the index and column name of the nth largest value in a Pandas data series

How can I (efficiently for a matrix much larger than the example provided) return the column name and index (or row name) of the nth largest or smallest value

import pandas as pd
import numpy as np

dates = pd.date_range('20130101', periods=6)
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))
matrix = df.corr()
matrix
          A         B         C         D
A  1.000000 -0.814913  0.495993 -0.880296
B -0.814913  1.000000 -0.211421  0.551441
C  0.495993 -0.211421  1.000000 -0.414037
D -0.880296  0.551441 -0.414037  1.000000

Then I would do something such as

def get_n_smallest(matrix, n):
    # can return as two variables, list, tuple, whatever...
    return row_name, col_name

get_n_smallest(matrix,0)
# would return D, A for the value -.880296

Upvotes: 2

Views: 2546

Answers (1)

jezrael
jezrael

Reputation: 862481

I think you can use stack for Series, then remove duplicates by drop_duplicates, sort_values and get MultiIndex values by indexing index:

np.random.seed(100)
dates = pd.date_range('20130101', periods=6)
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))
matrix = df.corr()
print (matrix)
          A         B         C         D
A  1.000000  0.570860 -0.558334 -0.434793
B  0.570860  1.000000 -0.358834 -0.564178
C -0.558334 -0.358834  1.000000  0.170589
D -0.434793 -0.564178  0.170589  1.000000

print (matrix.stack().drop_duplicates().sort_values())
B  D   -0.564178
A  C   -0.558334
   D   -0.434793
B  C   -0.358834
C  D    0.170589
A  B    0.570860
   A    1.000000
dtype: float64

def get_n_smallest(matrix, n):
    return matrix.stack().drop_duplicates().sort_values().index[n]

print (get_n_smallest(matrix,0))
('B', 'D')

print (get_n_smallest(matrix,1))
('A', 'C')

print (get_n_smallest(matrix,2))
('A', 'D')

def get_n_largest(matrix, n):
    return matrix.stack().drop_duplicates().sort_values(ascending=False).index[n]


print (get_n_largest(matrix,0))
('A', 'A')

print (get_n_largest(matrix,1))
('A', 'B')

print (get_n_largest(matrix,2))
('C', 'D')

Upvotes: 1

Related Questions