Reputation: 123
I have a data frame that looks like this (but with 565 date columns):
County 2021-03-01 2021-03-02 2021-03-03 ...
0 Solano 2 3 7
1 Davis 1 5 3
2 Weber 0 4 2
3 Yuba 1 2 1
4 Marin 100 1 4
What I'm trying to do is roll together the values for each day into a monthly sum. It's a very wide table, so I don't really have the patience or desire to do something like this manually. Ultimately, I want my data to look something more like this:
County 2021-03 2021-04 2021-05 ...
0 Solano 12 24 60
1 Davis 9 16 30
2 Weber 6 14 21
3 Yuba 4 12 27
4 Marin 105 200 1021
Or, if it's easier, something like this instead:
County Year_Month Sum
0 Solano 2021-03 12
1 Davis 2021-03 9
2 Weber 2021-03 6
3 Yuba 2021-03 4
4 Marin 2021-03 105
I have looked into using pd.melt()
to make grouping and aggregating the dataframe the way I want to, but I haven't found enough examples of its usage to grasp if that's the best path forward. I found a similar question and answer here, but I couldn't quite understand the application of pd.melt()
without a singular and generic date column.
I'm looking for some help getting to a monthly sum but just need a little more guidance. Any ideas?
Upvotes: 1
Views: 615
Reputation: 24314
If there only 1 string name and others are date like names in columns and your country column contains unique values:
You can try set_index()
+columns attribute+groupby()
:
df=df.set_index('County')
df.columns=pd.to_datetime(df.columns).strftime("%Y-%m")
df=df.groupby(level=0,axis=1).sum().reset_index()
OR
for your updated question use:
df=(df.melt('County',var_name='Year_Month',value_name='Sum')
.assign(Year_Month=lambda x:pd.to_datetime(x['Year_Month']).dt.strftime("%Y-%m"))
.groupby(['County','Year_Month']).sum().reset_index())
Upvotes: 3