Reputation: 379
I have a DataFrame, called weights:
| person | age | weight_at_time_1 | weight_at_time_2 |
| Joe | 23 | 280 | 240 |
| Mary | 19 | 111 | 90 |
| Tom | 34 | 150 | 100 |
I want to find out the highest weight loss (essentially, where the the difference in weight is the most negative) and find out what this weight_at_time_1 and weight_at_time_2 were that yielded the result, to see the significance of the weight loss. As well as the name of the person who lost it.
weights['delta_weight] = weights['weight_at_time_2'] - ['weight_at_time_1]
weights['delta_weight'].min()
This tells me that the most negative change in weight (highest weight loss) was -50.
I want to report back the weight_at_time_1 and weight_at_time_2 which yielded this min(). Is there a way to perhaps retrieve the index for the row at which min() is found? Or do I have to loop through the DataFrame and keep track of that?
Upvotes: 0
Views: 55
Reputation: 17478
If you have multiple max/min you can also use this:
delta = df.weight_at_time_2 - df.weight_at_time_1
df.loc[delta == delta.min()]
To answer your comment:
In [3]: delta = df.weight_at_time_2 - df.weight_at_time_1
In [4]: bool_idx = delta == delta.min()
# In this way, we are actually using the *Boolean indexing*,
# a boolean vectors to filter the data out of a DataFrame
In [5]: bool_idx
Out[5]:
0 False
1 False
2 True
dtype: bool
# These two lines are equivalent, the result is a DataFrame,
# contains all the rows that match the True/False in the
# same position of `bool_idx`
# In [6]: df.loc[bool_idx]
In [6]: df.loc[bool_idx, :]
Out[6]:
person age weight_at_time_1 weight_at_time_2
2 Tom 34 150 100
# To specify the column label, we can get a Series out the
# filtered DataFrame
In [7]: df.loc[bool_idx, 'person']
Out[7]:
2 Tom
Name: person, dtype: object
# To drop the Series data structure
# - use `.values` property to get a `numpy.ndarray`
# - use `.to_list()` method to get a list
In [8]: df.loc[bool_idx, 'person'].values
Out[8]: array(['Tom'], dtype=object)
In [9]: df.loc[bool_idx, 'person'].to_list()
Out[9]: ['Tom']
# Now, at this time I think you must know many ways
# to get only a string 'Tom' out of above results :)
By the way, @WeNYoBen's great answer is the way of Selection By Label, while this answer is the way of Selection By Boolean Indexing.
For better understanding, I would also suggest you to read through this great official doc for Indexing and Selecting Data of Pandas.
Upvotes: 3
Reputation: 323276
Here is one way using idxmin
df.loc[[(df.weight_at_time_1-df.weight_at_time_2).idxmin()],:]
person age weight_at_time_1 weight_at_time_2
1 Mary 19 111 90
Upvotes: 3