Kevin
Kevin

Reputation: 123

Group Daily Date Columns by Month and Sum Values

I have a data frame that looks like this (but with 565 date columns):

    County  2021-03-01 2021-03-02 2021-03-03 ...
0   Solano           2          3          7
1    Davis           1          5          3
2    Weber           0          4          2
3     Yuba           1          2          1
4    Marin         100          1          4

What I'm trying to do is roll together the values for each day into a monthly sum. It's a very wide table, so I don't really have the patience or desire to do something like this manually. Ultimately, I want my data to look something more like this:

    County  2021-03 2021-04 2021-05 ...
0   Solano       12      24      60
1    Davis        9      16      30
2    Weber        6      14      21
3     Yuba        4      12      27
4    Marin      105     200    1021

Or, if it's easier, something like this instead:

    County  Year_Month  Sum
0   Solano     2021-03   12
1    Davis     2021-03    9
2    Weber     2021-03    6
3     Yuba     2021-03    4
4    Marin     2021-03  105

I have looked into using pd.melt() to make grouping and aggregating the dataframe the way I want to, but I haven't found enough examples of its usage to grasp if that's the best path forward. I found a similar question and answer here, but I couldn't quite understand the application of pd.melt() without a singular and generic date column.

I'm looking for some help getting to a monthly sum but just need a little more guidance. Any ideas?

Upvotes: 1

Views: 615

Answers (1)

Anurag Dabas
Anurag Dabas

Reputation: 24314

If there only 1 string name and others are date like names in columns and your country column contains unique values:

You can try set_index()+columns attribute+groupby() :

df=df.set_index('County')
df.columns=pd.to_datetime(df.columns).strftime("%Y-%m")
df=df.groupby(level=0,axis=1).sum().reset_index()

OR

for your updated question use:

df=(df.melt('County',var_name='Year_Month',value_name='Sum')
      .assign(Year_Month=lambda x:pd.to_datetime(x['Year_Month']).dt.strftime("%Y-%m"))
      .groupby(['County','Year_Month']).sum().reset_index())

Upvotes: 3

Related Questions