Lynn
Lynn

Reputation: 4398

Groupby Column in Pandas and Perform calculation (Python)

I have a dataset, df, where I would like to:

  1. Take the average of the TotalB column based upon grouping the TotalB column.
  2. I would then like to take this new column and subtract the free value to obtain the Used value

df1

date        name    rack    TotalB  freeB   
11/20/2020  a       yes     11      5   
11/20/2020  a       yes     10      5   
11/20/2020  a       yes     12      5   
11/20/2020  a       yes     10      5   
11/20/2020  b       no      5       2   
11/20/2020  b       no      5       2   
11/20/2020  b       no      6       2   

Desired Outcome

date        name    rack    TotalB  freeB   UsedB
11/20/2020  a       yes     10.75       5    5.75
11/20/2020  b       no      5.33        2    3.33

What I am doing:

df.groupby('rack')['TotalB'].mean()
UsedB = df["TotalB"] - df["freeB"]

I am having trouble with retaining all the columns within the dataset. Any suggestion is appreciated.

Upvotes: 4

Views: 83

Answers (2)

Kenan
Kenan

Reputation: 14094

Groupby similar columns, calculate the mean then assign UsedB

df.groupby(['date', 'name', 'rack', 'freeB']).agg({'TotalB': 'mean'}).reset_index().assign(UsedB=lambda x: x['TotalB'] - x['freeB'])

         date name rack  freeB     TotalB     UsedB
0  11/20/2020    a  yes      5  10.750000  5.750000
1  11/20/2020    b   no      2   5.333333  3.333333

You could also use the as_index=False option:

df.groupby(["date", "name", "rack"], as_index=False).mean().assign(
    UsedB=lambda x: x.TotalB - x.freeB
)

Upvotes: 1

Ami Tavory
Ami Tavory

Reputation: 76297

From the question, it seems like you want to do something like the following:

df = df1.groupby(['date', 'name', 'rack'], as_index=False).agg({'TotalB': 'mean', 'freeB': 'mean})

which will group by the first 3 columns, and find the mean of the last 2, followed by

df['UsedB'] = df.TotalB - df.freeB

Upvotes: 4

Related Questions