Reputation: 1499
Hello I have the below dataframe and I am trying to calculate the absolute change for each City (row) and print the city name for the city with the largest value. I am able to get the absolute change but instead of getting the city name I am only able to get the index value. Is there a way I can get the city name instead of the index value. In the example below I am getting index value 3 instead of "Dallas".
df = pd.DataFrame({'City': ['Chicago', 'Atlanta', 'New York', 'Dallas'],
'col1': [10, 15, 80, 200],
'col2': [45, 75, 90, 5],
'col3': [25, 35, 60, 420],
'col4': [60, 105, 325, 55]})
Output:
City col1 col2 col3 col4
0 Chicago 10 45 25 60
1 Atlanta 15 75 35 105
2 New York 80 90 60 325
3 Dallas 200 5 420 55
Obtain Max Absolute Value of col1-col4
diff_row = df.max(axis=1) - df.min(axis=1)
print(diff_row.idxmax())
Current Output:
3
Desired Output:
Dallas
Upvotes: 2
Views: 822
Reputation: 294488
Taking a step back and using numpy.ptp
(peak to peak) to do the subtraction of max less min.
df.set_index('City').apply(np.ptp, 1).idxmax()
'Dallas'
We can push more over to numpy
with
df.City.values[np.ptp(df.set_index('City').values, 1).argmax()]
'Dallas'
Timing
%timeit df.City.values[np.ptp(df.set_index('City').values, 1).argmax()]
%timeit df.set_index('City').apply(np.ptp, 1).idxmax()
1000 loops, best of 3: 399 µs per loop
1000 loops, best of 3: 1.03 ms per loop
%%timeit
diff_row = df.max(axis=1) - df.min(axis=1)
df.loc[diff_row.idxmax(), 'City']
1000 loops, best of 3: 1.24 ms per loop
Upvotes: 1
Reputation: 394179
Pass the index label to loc
and the col of interest:
In[198]:
df.loc[diff_row.idxmax(), 'City']
Out[198]: 'Dallas'
Upvotes: 2