Makoto Miyazaki
Makoto Miyazaki

Reputation: 1983

Pandas: groupby and make a new column applying aggregate to two columns

I'm having a difficulty with applying agg to a groupby pandas dataframe.

I have a dataframe df like this:

order_id    distance_theo    bird_distance 
      10              100               80
      10               80               80
      10               70               80
      11               90               70
      11               70               70
      11               60               70
      12              200              180
      12              150              180
      12              100              180
      12               60              180

I want to groupby order_id, and make a new column crow by dividing distance_theo of the first row in each group by bird_distance in the first row of each group(or in any row, because there is only one value of bird_distance in one group).

order_id    distance_theo    bird_distance    crow
      10              100               80    1.25
      10               80               80    1.25
      10               70               80    1.25
      11               90               70    1.29
      11               70               70    1.29
      11               60               70    1.29
      12              200              180    1.11
      12              150              180    1.11
      12              100              180    1.11
      12               60              180    1.11

My attempt: df.groupby('order_id').agg({'crow', lambda x: x.distance_theo.head(1) / x.bird_distance.head(1)})

But I get an error:

'Series' object has no attribute 'distance_theo'

How can I solve this? Thanks for any kinds of advice!

Upvotes: 2

Views: 794

Answers (2)

Scott Boston
Scott Boston

Reputation: 153460

You could do it without groupby and use drop_duplicate and join:

df.join(df.drop_duplicates('order_id')\
  .eval('crow = distance_theo / bird_distance')[['crow']]).ffill()

or use assign instead of eval per @jezraela comments below:

df1.join(df1.drop_duplicates('order_id')\
   .assign(crow=df1.distance_theo / df1.bird_distance)[['crow']]).ffill()

Output:

   order_id  distance_theo  bird_distance      crow
0        10            100             80  1.250000
1        10             80             80  1.250000
2        10             70             80  1.250000
3        11             90             70  1.285714
4        11             70             70  1.285714
5        11             60             70  1.285714
6        12            200            180  1.111111
7        12            150            180  1.111111
8        12            100            180  1.111111
9        12             60            180  1.111111

Upvotes: 2

user3483203
user3483203

Reputation: 51165

Using groupby with first:

s = df.groupby('order_id').transform('first')
df.assign(crow=s.distance_theo.div(s.bird_distance))

   order_id  distance_theo  bird_distance      crow
0        10            100             80  1.250000
1        10             80             80  1.250000
2        10             70             80  1.250000
3        11             90             70  1.285714
4        11             70             70  1.285714
5        11             60             70  1.285714
6        12            200            180  1.111111
7        12            150            180  1.111111
8        12            100            180  1.111111
9        12             60            180  1.111111

Upvotes: 5

Related Questions