Charlotte
Charlotte

Reputation: 11

Sales per month per product id

I want to have a list of all product sales from 2020-01-01 until 2020-12-01 for sales of each id_main_product (1-100). Would it be better to first create some product groups (e.g. id_main_product from 1-10,11-20 etc.) as I think that the output would be a big table ?

I have already tried out some things with groupby etc. but I can´t find the solution.

This is the dataset:

casesalesdata = pd.read_csv("https://raw.githubusercontent.com/fredzett/Notebooks/master/01_case_sales_data.csv")

Upvotes: 0

Views: 392

Answers (2)

Rob Raymond
Rob Raymond

Reputation: 31206

  • it's not clear the analysis you wish to complete
  • first step is filter to rows you want
  • you can then generate summaries with groupby() and agg()
casesalesdata = pd.read_csv(
    "https://raw.githubusercontent.com/fredzett/Notebooks/master/01_case_sales_data.csv"
)

casesalesdata["date"] = pd.to_datetime(casesalesdata["date"])
# just dates and wanted products
df = casesalesdata.loc[
    casesalesdata["date"].between("2020-01-01", "2020-12-01")
    & casesalesdata["id_main_product"].between(1, 100)
]

df.groupby([df["date"].dt.month, "id_main_product"]).agg(
    {
        "items": "sum",
        "margin_perc": "mean",
        "avg_price_per_item": "mean",
        "rating": "mean",
    }
)


Upvotes: 1

Bengu Atici
Bengu Atici

Reputation: 17

First, you should change the data type of date column to datetime and format as %Y-%m to use later in pivot. Then, you can use pd.pivot.

casesalesdata.date=pd.to_datetime(casesalesdata.date)
casesalesdata.date=casesalesdata.date.dt.strftime('%Y-%m')
pd.pivot_table(casesalesdata,index=["date","id_main_product"],
               values=["items"],
               aggfunc='sum')

You may also refer to Group by Year and Month Panda Pivot Table

Upvotes: 0

Related Questions