Reputation: 829
I have a data frame that looks something like this.
import pandas as pd
data = [[5, 7, 10], [7, 20, 4,], [8, 1, 6,]]
cities = ['Boston', 'Phoenix', 'New York']
df = pd.DataFrame(data, columns=cities, index=cities)
Output:
Boston Phoenix New York
Boston 5 7 10
Phoenix 7 20 4
New York 8 1 6
And I want to be able to find the city pair with the greatest value. In this case I would want to return Phoenix,Phoenix.
I have tried:
cityMax = df.values.max()
cityPairs = df.idxmax()
The first one only gives me the largest value (20) and the second gives me each cities max pair not just the overall max. Is there a way to return the index and column header for a specified value in a dataframe?
Upvotes: 0
Views: 1031
Reputation: 109520
row_city, column_city = (df.max(axis=1).idxmax(), df.max(axis=0).idxmax())
Upvotes: 0
Reputation: 76917
You could try this too
In [15]: df_mat = df.as_matrix()
In [16]: cols, idxs = np.where(df_mat == np.amax(df_mat))
In [17]: ([df.columns[col] for col in cols], [df.index[idx] for idx in idxs])
Out[17]: (['Phoenix'], ['Phoenix'])
@piemont method seems more elegant. However, I wonder in your case (size of data), which method would work faster. Could you check that out, by timing these functions on your full data?
Upvotes: 1
Reputation: 36
Use unstack() and extract the top MultiIndex as a tuple using idxmax()
import pandas as pd
data = [[5, 7, 10], [7, 20, 4,], [8, 1, 6,]]
cities = ['Boston', 'Phoenix', 'New York']
df = pd.DataFrame(data, columns=cities, index=cities)
print df.unstack().idxmax()
returns:
('Phoenix', 'Phoenix')
Upvotes: 2