Reputation: 49
How to find the first of several minimum values in a dataset? I want to eventually find values that are at least 2 greater than the minimum value, sequentially.
For example,
import pandas as pd
import numpy as np
df = pd.DataFrame({'ID': [1,1,1,1,1,1,1], 'value': [0.6, 1.5, 1.6, 1.2, 2.8, 0.3, 0.2]})
I would like to identify df['value'][0], or simply (0.6), as the first minimum in this array. Then identify df['value'][4], or (2.8), as the value at least 2 greater than the first identified minimum (0.6).
df = pd.DataFrame({'ID': [1,1,1,1,1,1,1], 'value': [0.6, 1.5, 1.6, 1.2, 2.8, 0.3, 0.2]})
df['loc_min'] = df.value[(df.value.shift(1) >= df.value) & (df.value.shift(-1) >= df.value)]
df['loc_min']= df.groupby(['ID'], sort=False)['loc_min'].apply(lambda x: x.ffill())
df['condition'] = (df['value'] >= df['loc_min'] + 2)
This works for other datasets but not when the minimums are first.
The ideal output would be:
ID value loc_min condition
0 1 0.6 nan False
1 1 1.5 0.6 False
2 1 1.6 0.6 False
3 1 1.2 0.6 False
4 1 2.8 0.6 True
5 1 0.3 0.3 False
6 1 0.2 0.2 False
As suggested in a comment, a loop would be a better way to go about this.
Upvotes: 0
Views: 1935
Reputation: 59274
Seems like you need cummin
and a simple loc
df['cummin_'] = df.groupby('ID').value.cummin()
df['condition'] = df.value >= df.cummin_ + 2
ID value cummin_ condition
0 1 0.6 0.6 False
1 1 1.5 0.6 False
2 1 1.6 0.6 False
3 1 1.2 0.6 False
4 1 2.8 0.6 True
5 1 0.3 0.3 False
6 1 0.2 0.2 False
Another option is to use expanding
. Take, for example,
df = pd.DataFrame({'ID': [1,1,1,1,1,1,1,2,2], 'value': [0.6, 1.5, 1.6, 1.2, 2.8, 0.3, 0.2,0.4,2.9]})
Then
df.groupby('ID').value.expanding(2).min()
ID
1 0 NaN
1 0.6
2 0.6
3 0.6
4 0.6
5 0.3
6 0.2
2 7 NaN
8 0.4
The expanding function yields your NaN
s at first while cummin
accounts for the first value. Just a matter of understanding how you want results to be interpreted.
Upvotes: 1