Danish
Danish

Reputation: 2871

groupby sum month wise on date time data

I have a transaction data as shown below. which is a 3 months data.

Card_Number Card_type   Category    Amount  Date
0   1       PLATINUM    GROCERY      100    10-Jan-18
1   1       PLATINUM    HOTEL        2000   14-Jan-18
2   1       PLATINUM    GROCERY      500    17-Jan-18
3   1       PLATINUM    GROCERY      300    20-Jan-18
4   1       PLATINUM    RESTRAUNT    400    22-Jan-18
5   1       PLATINUM    HOTEL        500    5-Feb-18
6   1       PLATINUM    GROCERY      400    11-Feb-18
7   1       PLATINUM    RESTRAUNT    600    21-Feb-18
8   1       PLATINUM    GROCERY      800    17-Mar-18
9   1       PLATINUM    GROCERY      200    21-Mar-18
10  2       GOLD        GROCERY      1000   12-Jan-18
11  2       GOLD        HOTEL        3000   14-Jan-18
12  2       GOLD        RESTRAUNT    500    19-Jan-18
13  2       GOLD        GROCERY      300    20-Jan-18
14  2       GOLD        GROCERY      400    25-Jan-18
15  2       GOLD        HOTEL        1500   5-Feb-18
16  2       GOLD        GROCERY      400    11-Feb-18
17  2       GOLD        RESTRAUNT    600    21-Mar-18
18  2       GOLD        GROCERY      200    21-Mar-18
19  2       GOLD        HOTEL        700    25-Mar-18
20  3       SILVER      RESTRAUNT    1000   13-Jan-18
21  3       SILVER      HOTEL        1000   16-Jan-18
22  3       SILVER      GROCERY      500    18-Jan-18
23  3       SILVER      GROCERY      300    23-Jan-18
24  3       SILVER      GROCERY      400    28-Jan-18
25  3       SILVER      HOTEL        500    5-Feb-18
26  3       SILVER      GROCERY      400    11-Feb-18
27  3       SILVER      HOTEL        600    25-Mar-18
28  3       SILVER      GROCERY      200    29-Mar-18
29  3       SILVER      RESTRAUNT    700    30-Mar-18

I am struggling to get below dataframe.

  Card_No Card_Type  D   Jan_Sp Jan_N Feb_Sp Feb_N Mar_Sp  GR_T  RES_T 
    1     PLATINUM   70  3300   5     1500   3     1000    2300  100
    2     GOLD       72  5200   5     1900   2     1500    2300  1100
    3     SILVER .   76  2900   5     900    2     1500    1800  1700

D = Duration in days from first transaction to last transaction.

Jan_Sp = Total spending on January.

Feb_Sp = Total spending on February.

Mar_Sp = Total spending on March.

Jan_N = Number of transaction in Jan.

Feb_N = Number of transaction in Feb.

GR_T = Total spending on GROCERY.

RES_T = Total spending on RESTRAUNT.

I tried following code. I am very new to pandas.

q9['Date'] = pd.to_datetime(Card_Number['Date'])
q9 = q9.sort_values(['Card_Number', 'Date'])
q9['D'] = q9.groupby('ID')['Date'].diff().dt.days

Upvotes: 0

Views: 67

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150745

My approach is three steps

  1. get the date range
  2. get the Monthly spending
  3. get the category spending

Step 1: Date

date_df = df.groupby('Card_type').Date.apply(lambda x: (x.max()-x.min()).days)

Step 2: Month

month_df = (df.groupby(['Card_type', df.Date.dt.month_name().str[:3]])
            .Amount
            .agg({'sum','count'})
            .rename({'sum':'_Sp', 'count': '_N'}, axis=1)
            .unstack('Date')
           )

# rename
month_df.columns = [b+a for a,b in month_df.columns]

Step 3: Category

cat_df = df.pivot_table(index='Card_type', 
                        columns='Category',
                        values='Amount', 
                        aggfunc='sum')

# rename
cat_df.columns = [a[:2]+"_T" for a in cat_df.columns]

And finally concat:

pd.concat( (date_df, month_df, cat_df), axis=1)

gives:

           Date  Feb_Sp  Jan_Sp  Mar_Sp  Feb_N  Jan_N  Mar_N  GR_T  HO_T  RE_T
Card_type                                                                     
GOLD         72    1900    5200    1500      2      5      3  2300  5200  1100
PLATINUM     70    1500    3300    1000      3      5      2  2300  2500  1000
SILVER       76     900    3200    1500      2      5      3  1800  2100  1700

If your data have several years, and you want to separate them by year, then you can add df.Date.dt.year in each groupby above:

date_df = df.groupby([df.Date.dt.year,'Card_type']).Date.apply(lambda x: (x.max()-x.min()).days)
month_df = (df.groupby([df.Date.dt.year,'Card_type', df.Date.dt.month_name().str[:3]])
            .Amount
            .agg({'sum','count'})
            .rename({'sum':'_Sp', 'count': '_N'}, axis=1)
            .unstack(level=-1)
           )

# rename
month_df.columns = [b+a for a,b in month_df.columns]

cat_df = (df.groupby([df.Date.dt.year,'Card_type', 'Category'])
            .Amount
            .sum()
            .unstack(level=-1)
         )

# rename
cat_df.columns = [a[:2]+"_T" for a in cat_df.columns]

pd.concat((date_df, month_df, cat_df), axis=1)

gives:

                Date  Feb_Sp  Jan_Sp  Mar_Sp  Feb_N  Jan_N  Mar_N  GR_T  HO_T  
Date Card_type                                                                  
2017 GOLD         72    1900    5200    1500      2      5      3  2300  5200   
     PLATINUM     70    1500    3300    1000      3      5      2  2300  2500   
     SILVER       76     900    3200    1500      2      5      3  1800  2100   
2018 GOLD         72    1900    5200    1500      2      5      3  2300  5200   
     PLATINUM     70    1500    3300    1000      3      5      2  2300  2500   
     SILVER       76     900    3200    1500      2      5      3  1800  2100  

I would recommend keeping the dataframe this way, so you can access the annual data, e.g. result_df.loc[2017] gives you 2017 data. If you really want 2017 as year, you can do result_df.unstack(level=0).

Upvotes: 2

Related Questions