Makoto Miyazaki
Makoto Miyazaki

Reputation: 1983

Pandas: groupby and make a new column by concatenating the result

I want to know how to create a new column after groupby and apply sum calculation.

I have a data frame like this.

> df
tour_id  time  condA  condB  condC
      1    10   True   True   True
      1    20   True   True   True
      1    30  False  False  False
      1    40  False  False  False
      2    15   True   True   True
      2    25  False  False  False
      2    30  False  False  False
      2    45  False  False  False
      2    50   True   True   True

I want to

  1. groupby tour_id and take a sum of time column where condA and condB and condC are all False. Like below.

    tour_id    sum
          1     70
          2    100
    
  2. Merge the result of 1 to the original data frame df, filling with the same value in a group, and name this new column as driving.

so the result should look like:

tour_id  time  condA  condB  condC  driving
      1    10   True   True   True       70
      1    20   True   True   True       70
      1    30  False   False False       70
      1    40  False   False False       70
      2    15   True   True   True      100
      2    25  False  False  False      100
      2    30  False  False  False      100
      2    45  False  False  False      100
      2    50   True   True   True      100

My attempt:

temp = df[(df.condA == True)&(df.condB == True) &(df.condC == True)]
df2 = temp.groupby('tour_id').time.sum().reset_index()

But I don't know how to merge df2 to the original df.

Upvotes: 0

Views: 57

Answers (3)

rahlf23
rahlf23

Reputation: 9019

df['driving'] = df['tour_id'].map(df[~df[['condA','condB','condC']].all(1)].groupby('tour_id')['time'].sum())

Yields:

   tour_id  time  condA  condB  condC  driving
0        1    10   True   True   True       70
1        1    20   True   True   True       70
2        1    30  False  False  False       70
3        1    40  False  False  False       70
4        2    15   True   True   True      100
5        2    25  False  False  False      100
6        2    30  False  False  False      100
7        2    45  False  False  False      100
8        2    50   True   True   True      100

Upvotes: 1

BENY
BENY

Reputation: 323226

Using all to filter the data frame

df['driving']=df.tour_id.map(df[(df.iloc[:,-3:]).all(1)].groupby('tour_id').time.sum())

Upvotes: 1

ysearka
ysearka

Reputation: 3855

You can turn your groupby dataframe into a dict, and then map the tour_id column through it:

df['driving'] = df.tour_id.map(temp.groupby('tour_id').time.sum().to_dict())

Upvotes: 0

Related Questions