Vincent
Vincent

Reputation: 1647

Pandas groupby sum if value in group

Sample of data, actual data has many years. The type "Lien" or "Lien Endorsement" can only appear once per year. Other types can repeat within a year.

tax_allyears =

tax_year    type                amount  
2013        Lien Interest       4
2014        Lien Interest       10
2014        Lien                100
2014        Lien Interest       15
2013        Lien Endorsement    200

This line almost works, it sums the "Lien Interest" values by year.

by_year_interest = tax_allyears_1[tax_allyears_1['type'] == 'Lien Interest'].groupby(by=['tax_year'])['amount'].sum()

What I want is to distinguish between years that have "Lien" vs "Lien Interest"

by_year_Lien_interest = some function

tax_year    amount
2014        25

by_year_Lien_Endorsement_interest = some function

tax_year    amount
2013        4

Upvotes: 3

Views: 4841

Answers (2)

Joe T. Boka
Joe T. Boka

Reputation: 6589

If tax_year, type and amount are the names of the columns in a DataFrame, then you can do this:

# Create a groupby object
name = df.groupby(['tax_year', 'type'])

# Apply the sum function to the groupby object
df = name.sum()

Upvotes: 0

Anand S Kumar
Anand S Kumar

Reputation: 90869

You can first create two different list of unique years, one where Lien comes, and the other where Lien Endorsement comes. And then use those unique lists in your condition for filtering the tax_allyears DataFrame using Series.isin . Example -

lienyears = tax_allyears.loc[tax_allyears['type'] == 'Lien','tax_year'].unique().tolist()
lienendorsementyears = tax_allyears.loc[tax_allyears['type'] == 'Lien Endorsement','tax_year'].unique().tolist()

by_year_lien_interest = tax_allyears[(tax_allyears['type'] == 'Lien Interest') & tax_allyears['tax_year'].isin(lienyears)].groupby('tax_year')['amount'].sum()
by_year_lien_endorsement_interest = tax_allyears[(tax_allyears['type'] == 'Lien Interest') & tax_allyears['tax_year'].isin(lienendorsementyears)].groupby('tax_year')['amount'].sum()

Demo -

In [7]: tax_allyears
Out[7]:
   tax_year              type  amount
0      2013     Lien Interest       4
1      2014     Lien Interest      10
2      2014              Lien     100
3      2014     Lien Interest      15
4      2013  Lien Endorsement     200

In [9]: lienyears = tax_allyears.loc[tax_allyears['type'] == 'Lien','tax_year'].unique().tolist()

In [10]: lienendorsementyears = tax_allyears.loc[tax_allyears['type'] == 'Lien Endorsement','tax_year'].unique().tolist()

In [13]: by_year_lien_interest = tax_allyears[(tax_allyears['type'] == 'Lien Interest') & tax_allyears['tax_year'].isin(lienyears)].groupby('tax_year')['amount'].sum()

In [15]: by_year_lien_endorsement_interest = tax_allyears[(tax_allyears['type'] == 'Lien Interest') & tax_allyears['tax_year'].isin(lienendorsementyears)].groupby('tax_year')['amount'].sum()

In [16]: by_year_lien_interest
Out[16]:
tax_year
2014    25
Name: amount, dtype: int64

In [17]: by_year_lien_endorsement_interest
Out[17]:
tax_year
2013    4
Name: amount, dtype: int64

Upvotes: 1

Related Questions