Reputation: 83
I try to filter the data of the customers who have their first bill in 2019, then calculate their number of bills in 2020 and 2021:
DF = pd.DataFrame(
{"ID": ["1","1","2", "2", "2","2","3","3"],
"Year": ["2017", "2019", "2019","2019", "2020", "2020","2019","2021"],
"Price": ["10","0","200","100","4000","3440","3445","2303"]}
)
The result I try to find look like this :
ID 2019 price_2019 2020 price_2020 2021 price_2021
2 2 300 2 7440 0 0
3 1 3445 0 0 1 2303
I can't find a function to do this calculation. Any idea how to make it work?
Upvotes: 1
Views: 60
Reputation: 28644
DF = DF.astype(int)
Filter for rows where the first Year is 2019:
year_2019 = DF.groupby('ID', sort = False).Year.transform('min') == 2019
filtered = DF.loc[year_2019]
Aggregate and rename:
filtered = (filtered
.pivot_table(index='ID',
values='Price',
columns='Year',
aggfunc=['sum', 'size'],
fill_value = 0)
.rename(columns={'sum':'price'}, level=0)
)
# reorganize the column names to match expected output
filtered.columns = [f"{left}_{right}"
if left == 'price'
else right
for left, right in filtered]
filtered
price_2019 price_2020 price_2021 2019 2020 2021
ID
2 300 7440 0 2 2 0
3 3445 0 2303 1 0 1
Upvotes: 1