dfahsjdahfsudaf
dfahsjdahfsudaf

Reputation: 481

Conditional Group By Statement

I have the data frame :

  PRODUCT  SPEED  HEIGHT  LENGTH      DATE
      30       10    5       8         2019-08
      30       13    9       15        2019-08
      31       19    8       12        2019-08
      30       5     6       3         2019-08
      31       11    8       6         2019-09
      30       11    8       6         2019-09
      30       11    8       6         2019-09
      31       11    8       6         2019-09
      31       11    8       6         2019-09

I want to produce a table that has the averages for SPEED, HEIGHT, LENGTH by month. I also want to have a column in that table that has the percentage of the total PRODUCT per month when PRODUCT=30.

My desired output:

  DATE      SPEED   HEIGHT   LENGTH   PRODUCT
  2019-08    11.75    7        9.5      75%
  2019-09     11      8         6       40%

I have the following code:

avg_summary = df.groupby(['DATE']).agg({'SPEED': 'mean',
                                         'HEIGHT' : 'mean',
                                         'LENGTH': 'mean',
                                         'PRODUCT': 'count'}).reset_index()

This outputs:

  DATE      SPEED   HEIGHT   LENGTH   PRODUCT
  2019-08    11.75    7        9.5      4
  2019-09     11      8         6       5

I know I can do (df[df.PRODUCT == 30].shape[0])to find the count in the whole data frame for when PRODUCT == 30, but I cannot figure out how to count the number of instances for when PRODUCT=30 by month.

I figured I could divide the number of instances for when PRODUCT=30 by the total count of PRODUCT to get the result I am looking for.

Any help/tips would be very appreciated! Thanks.

Upvotes: 1

Views: 60

Answers (3)

manwithfewneeds
manwithfewneeds

Reputation: 1167

You could also write an inline function to pass in for the product column. An f-string can be used if you want the % sign in the column:

df = (df.groupby('DATE', as_index=False)
        .agg({'SPEED': 'mean',
              'HEIGHT': 'mean',
              'LENGTH': 'mean',
              'PRODUCT': lambda x: f'{100*(x.eq(30).sum()/len(x)):.0f}%'})
      )


print(df)

      DATE  SPEED  HEIGHT  LENGTH PRODUCT
0  2019-08  11.75       7     9.5     75%
1  2019-09  11.00       8     6.0     40%

Upvotes: 1

Jonas Juhler
Jonas Juhler

Reputation: 64

You could combine the two things that you were talking about and I think that this will get you what you want.

counts = df[df['PRODUCT']==30].groupby(['DATE']).agg({'PRODUCT': 'count'}).reset_index()

avg_summary['PRODUCT'] = (counts['PRODUCT'] / avg_summary['PRODUCT'])*100

First off you filter on the PRODUCT=30 and then you group by DATE, which gives you the counts by month of PRODUCT=30.

Upvotes: 1

Erfan
Erfan

Reputation: 42886

You were quite close, after your groupby, filter your data on PRODUCT == 30, groupby on date again and then divide the size product by the original avg_summary:

s = df.query('PRODUCT==30').groupby('DATE')['PRODUCT'].size().to_numpy()
avg_summary['PRODUCT'] = s / avg_summary['PRODUCT'] * 100

output

      DATE  SPEED  HEIGHT  LENGTH  PRODUCT
0  2019-08  11.75       7     9.5     75.0
1  2019-09  11.00       8     6.0     40.0

Upvotes: 2

Related Questions