moys
moys

Reputation: 8033

Sum a seprate colum based on the range of the dataframe between values in other columns after groupby

I have a dataframe as below

id  Supply  days    days_180
1   30         0    180
1   100      183    363
1   80       250    430
2   5          0    180
2   5         10    190
3   5          0    180
3   30       100    280
3   30       150    330
3   30       200    380
3   30       280    460
3   50       310    490

I want to sum 'Supply' where days are between 'days' & 'days+180' for each row. This needs to be done for each group after groupby('id').

The expected output is as below

id  Supply  days    days_180    use
1   30         0        180     30
1   100      183        363     180
1   80       250        430     80
2   5          0        180     10
2   5         10        190     10
3   5          0        180     65
3   30       100        280     120
3   30       150        330     140
3   30       200        380     110
3   30       280        460     80
3   50       310        490     50

I have tried the code below, but it is not working as intended.

df_d['use']=df_d.groupby('id').apply(lambda x: x.loc[x['days'].between(x['days'],x['days_180']),'supply'].sum())

Upvotes: 2

Views: 42

Answers (2)

Dev Khadka
Dev Khadka

Reputation: 5451

you can also do it using numpy's broadcast and np.where

df.groupby("id").apply(
    lambda g: g.assign(use=(np.where((g.days.values>=g.days.values[:, np.newaxis]) & 
        (g.days.values<=g.days_180.values[:, np.newaxis]), g.Supply, 0).sum(axis=1)))
).reset_index(drop=True)

also jezrael's can be modified like below to use lambda in side apply

df = df.groupby('id').apply(lambda x:
    x.assign(use=[x.loc[(x['days'] <= d) & (x['days_180'] >= d),'Supply'].sum() for d in x['days_180']])
)
print (df)

Upvotes: 0

jezrael
jezrael

Reputation: 862581

Use list comprehension for loop each days_180 values per groups, filter with sum and create new column:

def f(x):
    a = [x.loc[(x['days'] <= d) & (x['days_180'] >= d),'Supply'].sum() for d in x['days_180']]
    x['use'] = a
    return x

Or solution with another lambda:

def f(x):
    x['use'] = x['days_180'].apply(lambda d: x.loc[(x['days'] <= d) & 
                                                   (x['days_180'] >= d), 'Supply'].sum())
    return x


df_d = df_d.groupby('id').apply(f)
print (df_d)
    id  Supply  days  days_180  use
0    1      30     0       180   30
1    1     100   183       363  180
2    1      80   250       430   80
3    2       5     0       180   10
4    2       5    10       190    5
5    3       5     0       180   65
6    3      30   100       280  120
7    3      30   150       330  140
8    3      30   200       380  110
9    3      30   280       460   80
10   3      50   310       490   50

Upvotes: 2

Related Questions