Anna
Anna

Reputation: 379

Finding which row of the DataFrame

I have a DataFrame, called weights:

| person | age | weight_at_time_1 | weight_at_time_2 |
| Joe    | 23  | 280              | 240              |  
| Mary   | 19  | 111              | 90               |    
| Tom    | 34  | 150              | 100              |

I want to find out the highest weight loss (essentially, where the the difference in weight is the most negative) and find out what this weight_at_time_1 and weight_at_time_2 were that yielded the result, to see the significance of the weight loss. As well as the name of the person who lost it.

weights['delta_weight] = weights['weight_at_time_2'] - ['weight_at_time_1]
weights['delta_weight'].min()

This tells me that the most negative change in weight (highest weight loss) was -50.

I want to report back the weight_at_time_1 and weight_at_time_2 which yielded this min(). Is there a way to perhaps retrieve the index for the row at which min() is found? Or do I have to loop through the DataFrame and keep track of that?

Upvotes: 0

Views: 55

Answers (2)

YaOzI
YaOzI

Reputation: 17478

If you have multiple max/min you can also use this:

delta = df.weight_at_time_2 - df.weight_at_time_1
df.loc[delta == delta.min()]

To answer your comment:

In [3]: delta = df.weight_at_time_2 - df.weight_at_time_1

In [4]: bool_idx = delta == delta.min()

# In this way, we are actually using the *Boolean indexing*,
# a boolean vectors to filter the data out of a DataFrame
In [5]: bool_idx
Out[5]:
0    False
1    False
2     True
dtype: bool

# These two lines are equivalent, the result is a DataFrame,
# contains all the rows that match the True/False in the
# same position of `bool_idx`
# In [6]: df.loc[bool_idx]
In [6]: df.loc[bool_idx, :]
Out[6]:
  person  age  weight_at_time_1  weight_at_time_2
2    Tom   34               150               100

# To specify the column label, we can get a Series out the
# filtered DataFrame
In [7]: df.loc[bool_idx, 'person']
Out[7]:
2    Tom
Name: person, dtype: object

# To drop the Series data structure
#    - use `.values` property to get a `numpy.ndarray`
#    - use `.to_list()` method to get a list
In [8]: df.loc[bool_idx, 'person'].values
Out[8]: array(['Tom'], dtype=object)

In [9]: df.loc[bool_idx, 'person'].to_list()
Out[9]: ['Tom']

# Now, at this time I think you must know many ways
# to get only a string 'Tom' out of above results :)

By the way, @WeNYoBen's great answer is the way of Selection By Label, while this answer is the way of Selection By Boolean Indexing.

For better understanding, I would also suggest you to read through this great official doc for Indexing and Selecting Data of Pandas.

Upvotes: 3

BENY
BENY

Reputation: 323276

Here is one way using idxmin

df.loc[[(df.weight_at_time_1-df.weight_at_time_2).idxmin()],:]
  person  age  weight_at_time_1  weight_at_time_2
1   Mary   19               111                90

Upvotes: 3

Related Questions