Test
Test

Reputation: 550

Calculation of the sales in the last 6 months

I have a problem. I would like to calculate the turnover customer-specifically for the last 6 months. I have already calculated when the "end date" is (i.e. until where the 6 months go). I would now like to calculate per row and customer-specifically what he has purchased from the purchasing department in the last 6 months.

Dataframe

   customerId   fromDate  sales
0           1 2022-06-01    100
1           1 2022-05-25     20
2           1 2022-05-25     50
3           1 2022-05-20     30
4           1 2021-09-05     40
5           2 2022-06-02     80
6           3 2021-03-01     50
7           3 2021-02-01     20
import pandas as pd


d = {'customerId': [1, 1, 1, 1, 1, 2, 3, 3],
     'fromDate': ["2022-06-01", "2022-05-25", "2022-05-25", "2022-05-20", "2021-09-05",
                  "2022-06-02", "2021-03-01", "2021-02-01"],
     'sales': [100, 20, 50, 30, 40, 80, 50, 20]
    }
df = pd.DataFrame(data=d)

df['fromDate'] = pd.to_datetime(df['fromDate'], errors='coerce')

from datetime import date
from dateutil.relativedelta import relativedelta
def find_last_date(date):
  six_months = date + relativedelta(months=-6)
  return six_months

#df['fromDate'] = pd.to_datetime(df['fromDate'], errors='coerce')
df['last_month'] = df['fromDate'].apply(lambda x: find_last_date(x))

What I have

   customerId   fromDate  sales last_month 
0           1 2022-06-01    100 2022-03-01        
1           1 2022-05-25     20 2022-02-25        
2           1 2022-05-25     50 2022-02-25        
3           1 2022-05-20     30 2022-02-20        
4           1 2021-09-05     40 2021-06-05        
5           2 2022-06-02     80 2022-03-02        
6           3 2021-03-01     50 2020-12-01        
7           3 2021-02-01     20 2020-11-01        

What I want

   customerId   fromDate  sales last_month total_sales
0           1 2022-06-01    100 2022-03-01        200 # 100 + 20 + 50 + 30
1           1 2022-05-25     20 2022-02-25        100 # 20 + 50 + 30
2           1 2022-05-25     50 2022-02-25        100 # 50 + 20 + 30
3           1 2022-05-20     30 2022-02-20        30  # 30
4           1 2021-09-05     40 2021-06-05        40  # 40
5           2 2022-06-02     80 2022-03-02        80  # 80
6           3 2021-03-01     50 2020-12-01        70  # 50 + 20
7           3 2021-02-01     20 2020-11-01        20  # 20

Upvotes: 0

Views: 534

Answers (1)

gistBatch
gistBatch

Reputation: 68

You can basically use sum() in combination with a location that fits your criterion:

from datetime import datetime
from dateutil.relativedelta import relativedelta

import pandas as pd


def find_last_date(date_: datetime) -> datetime:
    six_months = date_ + relativedelta(months=-6)
    return six_months


def sum_func(row: pd.DataFrame, df: pd.DataFrame) -> int :
    return df[
            (df["customerId"] == row["customerId"])
             & (row["fromDate"] + relativedelta(months=-6)<= df["fromDate"])
             & (df["fromDate"]   <= row["fromDate"])
        ]["sales"].sum()



d = {
    "customerId": [1, 1, 1, 1, 1, 2, 3, 3],
    "fromDate": [
        "2022-06-01",
        "2022-05-25",
        "2022-05-25",
        "2022-05-20",
        "2021-09-05",
        "2022-06-02",
        "2021-03-01",
        "2021-02-01",
    ],
    "sales": [100, 20, 50, 30, 40, 80, 50, 20],
}
df = pd.DataFrame(data=d)

df["fromDate"] = pd.to_datetime(df["fromDate"], errors="coerce")
df["last_month"] = df["fromDate"].apply(find_last_date)
df["total_sales"]=df[["customerId", "fromDate"]].apply(lambda x: sum_func(x, df), axis=1)
print(df)

I cleaned your code a little bit. Some remarks:

  • You should avoid local variables named like imported stuff i.e.date
  • All imports should be at the beginning of your code
  • Your columns have now mixed names between camelcase and snakecase
  • You can pass functions directly to apply, no lambda necessary

If you don't need the "last_month" column you could also completely ommit it. It also conflicts with naming.

Upvotes: 1

Related Questions