Count number of occurence and sum according to condition of year

Question

I try to filter the data of the customers who have their first bill in 2019, then calculate their number of bills in 2020 and 2021:

DF = pd.DataFrame(
    {"ID": ["1","1","2", "2", "2","2","3","3"],
     "Year": ["2017", "2019", "2019","2019", "2020", "2020","2019","2021"],
     "Price": ["10","0","200","100","4000","3440","3445","2303"]}
)

The result I try to find look like this :

ID       2019  price_2019  2020  price_2020  2021  price_2021
2        2     300         2     7440        0     0
3        1     3445        0     0           1     2303

I can't find a function to do this calculation. Any idea how to make it work?

sammywemmy · Accepted Answer

DF = DF.astype(int)

Filter for rows where the first Year is 2019:

year_2019 = DF.groupby('ID', sort = False).Year.transform('min') == 2019
filtered = DF.loc[year_2019]

Aggregate and rename:

filtered = (filtered
            .pivot_table(index='ID', 
                         values='Price', 
                         columns='Year', 
                         aggfunc=['sum', 'size'],
                         fill_value = 0)
            .rename(columns={'sum':'price'}, level=0)
             )

# reorganize the column names to match expected output
filtered.columns = [f"{left}_{right}" 
                    if left == 'price' 
                    else right 
                    for left, right in filtered]

filtered

    price_2019  price_2020  price_2021  2019  2020  2021
ID                                                      
2          300        7440           0     2     2     0
3         3445           0        2303     1     0     1

Count number of occurence and sum according to condition of year

Answers (1)

Related Questions