umbum
umbum

Reputation: 31

Python Pandas groupby: filter and apply according to condition on values

I have this pandas dataframe

interval_mins = {
    '10' : 0.11,
    '15' : 0.4,
    '20' : 0.19
}
pd.DataFrame({
    'id' : [10, 15, 20, 10, 20, 15],
    'interval' : [0.1, 0.39, 0.2, 0.12, 0.25, 0.42]
})

In pandas DataFrame, I want to select items with interval values less than interval_mins for each id, and then add to the next interval value of the same id.

Is there a way without using for?

Expected output:

pd.DataFrame({
    'id' : [10, 15, 20, 10, 20, 15],
    'interval' : [0.1, 0.39, 0.2, 0.22, 0.25, 0.81]
})

enter image description here

Upvotes: 1

Views: 457

Answers (2)

Shubham Sharma
Shubham Sharma

Reputation: 71689

Let's do:

m = df['interval'] < df['id'].astype(str).map(interval_mins)
df.loc[m.groupby(df['id']).shift(fill_value=False), 'interval'] += df.groupby('id')['interval'].shift()

Details:

Create a boolean mask representing condition where interval values are less than interval_mins for each id:

print(m)

0     True
1     True
2    False
3    False
4    False
5    False
dtype: bool

groupby the boolean mask m on id and shift downwards:

print(m.groupby(df['id']).shift(fill_value=False))

0    False
1    False
2    False
3     True
4    False
5     True
dtype: bool

groupby the dataframe on id and shift the interval column:

print(df.groupby('id')['interval'].shift())

0     NaN
1     NaN
2     NaN
3    0.10
4    0.20
5    0.39
Name: interval, dtype: float64

Use boolean indexing with loc to add the values corresponding to shifted mask:

print(df)

   id  interval
0  10      0.10
1  15      0.39
2  20      0.20
3  10      0.22
4  20      0.25
5  15      0.81

Upvotes: 2

piterbarg
piterbarg

Reputation: 8219

Consistent with your rule, but not with your output (see my comment), this should do it. I leave intermediate calculations in place so it is easier to understand what's going on. Note I replaced keys in interval_mis with ints from strs. Desired output in column interval_2

df['add'] = df['id'].map(interval_mins)
df['add_cond'] = df['add']*(df['add'] > df['interval'])
df = (df.groupby('id')
        .apply(lambda d: d.assign(add_cond_shift=d['add_cond'].shift()))
        .fillna(0)
        .reset_index(drop = True)
    )
df['interval_2'] = df['interval'] + df['add_cond_shift']
df

produces

    id  interval    add add_cond    add_cond_shift  interval_2
0   10  0.10        0.11    0.11    0.00    0.10
1   10  0.12        0.11    0.00    0.11    0.23
2   15  0.39        0.40    0.40    0.00    0.39
3   15  0.42       0.40     0.00    0.40    0.82
4   20  0.20       0.22     0.22    0.00    0.20
5   20  0.25       0.22     0.00    0.22    0.47

Upvotes: 0

Related Questions