Reputation: 550
I have a problem. I would like to calculate the turnover customer-specifically for the last 6 months. I have already calculated when the "end date" is (i.e. until where the 6 months go). I would now like to calculate per row and customer-specifically what he has purchased from the purchasing department in the last 6 months.
Dataframe
customerId fromDate sales
0 1 2022-06-01 100
1 1 2022-05-25 20
2 1 2022-05-25 50
3 1 2022-05-20 30
4 1 2021-09-05 40
5 2 2022-06-02 80
6 3 2021-03-01 50
7 3 2021-02-01 20
import pandas as pd
d = {'customerId': [1, 1, 1, 1, 1, 2, 3, 3],
'fromDate': ["2022-06-01", "2022-05-25", "2022-05-25", "2022-05-20", "2021-09-05",
"2022-06-02", "2021-03-01", "2021-02-01"],
'sales': [100, 20, 50, 30, 40, 80, 50, 20]
}
df = pd.DataFrame(data=d)
df['fromDate'] = pd.to_datetime(df['fromDate'], errors='coerce')
from datetime import date
from dateutil.relativedelta import relativedelta
def find_last_date(date):
six_months = date + relativedelta(months=-6)
return six_months
#df['fromDate'] = pd.to_datetime(df['fromDate'], errors='coerce')
df['last_month'] = df['fromDate'].apply(lambda x: find_last_date(x))
What I have
customerId fromDate sales last_month
0 1 2022-06-01 100 2022-03-01
1 1 2022-05-25 20 2022-02-25
2 1 2022-05-25 50 2022-02-25
3 1 2022-05-20 30 2022-02-20
4 1 2021-09-05 40 2021-06-05
5 2 2022-06-02 80 2022-03-02
6 3 2021-03-01 50 2020-12-01
7 3 2021-02-01 20 2020-11-01
What I want
customerId fromDate sales last_month total_sales
0 1 2022-06-01 100 2022-03-01 200 # 100 + 20 + 50 + 30
1 1 2022-05-25 20 2022-02-25 100 # 20 + 50 + 30
2 1 2022-05-25 50 2022-02-25 100 # 50 + 20 + 30
3 1 2022-05-20 30 2022-02-20 30 # 30
4 1 2021-09-05 40 2021-06-05 40 # 40
5 2 2022-06-02 80 2022-03-02 80 # 80
6 3 2021-03-01 50 2020-12-01 70 # 50 + 20
7 3 2021-02-01 20 2020-11-01 20 # 20
Upvotes: 0
Views: 534
Reputation: 68
You can basically use sum()
in combination with a location that fits your criterion:
from datetime import datetime
from dateutil.relativedelta import relativedelta
import pandas as pd
def find_last_date(date_: datetime) -> datetime:
six_months = date_ + relativedelta(months=-6)
return six_months
def sum_func(row: pd.DataFrame, df: pd.DataFrame) -> int :
return df[
(df["customerId"] == row["customerId"])
& (row["fromDate"] + relativedelta(months=-6)<= df["fromDate"])
& (df["fromDate"] <= row["fromDate"])
]["sales"].sum()
d = {
"customerId": [1, 1, 1, 1, 1, 2, 3, 3],
"fromDate": [
"2022-06-01",
"2022-05-25",
"2022-05-25",
"2022-05-20",
"2021-09-05",
"2022-06-02",
"2021-03-01",
"2021-02-01",
],
"sales": [100, 20, 50, 30, 40, 80, 50, 20],
}
df = pd.DataFrame(data=d)
df["fromDate"] = pd.to_datetime(df["fromDate"], errors="coerce")
df["last_month"] = df["fromDate"].apply(find_last_date)
df["total_sales"]=df[["customerId", "fromDate"]].apply(lambda x: sum_func(x, df), axis=1)
print(df)
I cleaned your code a little bit. Some remarks:
date
lambda
necessaryIf you don't need the "last_month"
column you could also completely ommit it. It also conflicts with naming.
Upvotes: 1