Densoto
Densoto

Reputation: 89

Pandas retrieve value in one column(s) corresponding to the maximum value in another

Relatively new Python scripter here with a quick question about Pandas and DataFrames. There may be an easier method in Python to do what I am doing (outside of Pandas), so I am open to any and all suggestions.

I have a large data-set (don't we all), with dozens of attributes and tens of thousands of entries. I have successfully opened it (.csv file) and removed the unnecessary columns for the exercise, as well as used pandas techniques I learned from other questions here to parry down the table to something I can use

As an example, I now have dataframe df, with three columns - A, B and C. I need to find the index of the max of A and then pull the values of B and C at that index. Based off research on the best method, it seemed that idxmax was the best option.

MaxIDX = df['A'].idxmax()

This gives me the correct answer, however when I try to then grab a value using at based on this variable, I am getting errors. I believe it is because idxmax produces a series, and not an integer output.

variable = df.at[MaxIDX, 'B']

So the question I have is kind of two part.

How do I convert the series to the proper input for at? And, is there an easier way to do this that I am completely missing? All I want to do is get the index of the max of column A, and then pull the values of Column B and C at that index.

Any help is appreciated. Thanks a bunch! Cheers!

Note: Using: Python 3.6.4 and Pandas 0.22.0

Upvotes: 3

Views: 1790

Answers (1)

cs95
cs95

Reputation: 403258

np.random.seed(0)
df = pd.DataFrame(np.random.randn(5, 3), columns=list('ABC'))

df

          A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
2  0.950088 -0.151357 -0.103219
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863


df.A.idxmax()
1

What you claim fails, seems to work for me:

df.at[df.A.idxmax(), 'B']
1.8675579901499675

Although, based on your explanation, you may instead want loc, not at:

df.loc[df.A.idxmax(), ['B', 'C']]

B    1.867558
C   -0.977278
Name: 1, dtype: float64

Note: You may want to check that your index does not contain duplicate entries. This is one possible reason for failure.

Upvotes: 1

Related Questions