MBasith
MBasith

Reputation: 1499

Python Pandas Get Row Position Name instead of Index Value

Hello I have the below dataframe and I am trying to calculate the absolute change for each City (row) and print the city name for the city with the largest value. I am able to get the absolute change but instead of getting the city name I am only able to get the index value. Is there a way I can get the city name instead of the index value. In the example below I am getting index value 3 instead of "Dallas".

df = pd.DataFrame({'City': ['Chicago', 'Atlanta', 'New York', 'Dallas'],
                   'col1': [10, 15, 80, 200],
                   'col2': [45, 75, 90, 5],
                   'col3': [25, 35, 60, 420],
                   'col4': [60, 105, 325, 55]})

Output:

       City  col1  col2  col3  col4
0   Chicago    10    45    25    60
1   Atlanta    15    75    35   105
2  New York    80    90    60   325
3    Dallas   200     5   420    55

Obtain Max Absolute Value of col1-col4

diff_row = df.max(axis=1) - df.min(axis=1)
print(diff_row.idxmax())

Current Output:

3

Desired Output:

Dallas

Upvotes: 2

Views: 822

Answers (2)

piRSquared
piRSquared

Reputation: 294488

Taking a step back and using numpy.ptp (peak to peak) to do the subtraction of max less min.

df.set_index('City').apply(np.ptp, 1).idxmax()

'Dallas'

We can push more over to numpy with

df.City.values[np.ptp(df.set_index('City').values, 1).argmax()]

'Dallas'

Timing

%timeit df.City.values[np.ptp(df.set_index('City').values, 1).argmax()]
%timeit df.set_index('City').apply(np.ptp, 1).idxmax()
1000 loops, best of 3: 399 µs per loop
1000 loops, best of 3: 1.03 ms per loop

%%timeit 
diff_row = df.max(axis=1) - df.min(axis=1)
df.loc[diff_row.idxmax(), 'City']
1000 loops, best of 3: 1.24 ms per loop

Upvotes: 1

EdChum
EdChum

Reputation: 394179

Pass the index label to loc and the col of interest:

In[198]:
df.loc[diff_row.idxmax(), 'City']

Out[198]: 'Dallas'

Upvotes: 2

Related Questions