Zabman
Zabman

Reputation: 117

Creating a new column based on a group-by and condition of other columns

I am using a dataframe setup as follows:

| date | Product | Region | Age |

| 01/12/20 | Sales | NW | 17 |
| 01/12/20 | Sales | NW | 30 |
| 01/11/20 | Sales | SW | 32 |
| 01/12/20 | Import | SW | 5 |
| 01/12/20 | Import | SW | 45 |
| 01/12/20 | Sales | NW | 31 |
| 01/12/20 | Import | NW | 2 |

I would like to create a new dataframe showing the count of rows that have an age > 30, that are grouped by date/product/region. ie:

date Product Region aged
01/12/20 Sales NW 2
01/11/20 Sales SW 1
01/12/20 Import SW 1

I have tried a number of different approaches, the latest using lambda but so far unsuccessful:

df['aged'] = df.groupby([pd.Grouper(freq="M"), 'Product', 'Region'])['Product'].transform(lambda x: count( x ) if df['age'] >= 30 else 0)

Is anyone able to help and show me where I am going wrong?

Upvotes: 0

Views: 1499

Answers (3)

sammywemmy
sammywemmy

Reputation: 28659

Alternatives to the answer already proposed:

The simplest, in my opinion, would be to filter before grouping, just as the others; where I deviate is using as_index=False, and using size for aggregation:

(df.loc[df.Age.ge(30)]
  .groupby(["date", "Product", "Region"], sort = False, as_index=False)
  .agg(Aged=("Age", "size"))
)


    date    Product Region  Aged
0   01/12/20    Sales   NW  2
1   01/11/20    Sales   SW  1
2   01/12/20    Import  SW  1

This one uses a groupby, with the boolean condition included in the groupby:

(df.groupby(["date", "Product", "Region", df.Age.ge(30)], sort = False)
  .size()
  .drop(False, level="Age")
  .droplevel("Age")
  .reset_index(name="Aged")
 )

Another option is using crosstab, again with the boolean condition:

(pd.crosstab([df.date, df.Product, df.Region], df.Age.ge(30))
   .iloc[:, -1]
   .loc[lambda x: x != 0]
   .reset_index(name='Aged'))

    date    Product Region  Aged
0   01/11/20    Sales   SW  1
1   01/12/20    Import  SW  1
2   01/12/20    Sales   NW  2

Upvotes: 0

ansev
ansev

Reputation: 30920

I think you need:

new_df = (df[df['Age'].ge(30)].groupby(df.columns.difference(['Age']).tolist())['Age']
                              .count()
                              .reset_index(name='aged'))
print(new_df)


  Product Region      date  aged
0  Import     SW  01/12/20     1
1   Sales     NW  01/12/20     2
2   Sales     SW  01/11/20     1

Upvotes: 2

wwnde
wwnde

Reputation: 26676

Please Try:

df[df['Age'].ge(30)].groupby(['date', 'Product', 'Region'])\
['Age'].count().to_frame('aged')



                         aged
date     Product Region      
01/11/20 Sales   SW         1
01/12/20 Import  SW         1
         Sales   NW         2

Upvotes: 1

Related Questions