Reputation: 135
I have the following dataframe df:
ID 03/01/2021 17/01/2021 20/02/2021 03/07/2022 17/07/2022
0 ABC 5 6 10 5 5
1 BCD 6 7 3 6 10
2 DEF 10 4 5 2 8
I would like to look up for the columns that have the same month and year ,and save the sum of the values in a new dataframe.
Output would be:
date sum
01/2021 38
02/2021 18
07/2022 36
02/2023 19
I do appreciate your help!
Upvotes: 3
Views: 1768
Reputation: 13349
try transposing the dataframe and then group by year
and month
.
df = df.drop('ID', axis=1).T
df.index = pd.to_datetime(df.index, format='%d/%m/%Y')
df2 = df.groupby([df.index.year, df.index.month]).sum(1).sum(1)
df2.index = [ f"{m}/{y}" for (y,m) in df2.index ]
df2 = pd.DataFrame({'Date': df2.index, 'sum': df2.values})
df2:
Date sum
1/2021 38
2/2021 18
7/2022 36
2/2023 19
OR
1st sum up the date column values then groupby and calculate sum.
df1 = df.drop('ID', axis=1).sum(0)
df1.index = pd.to_datetime(df1.index, format='%d/%m/%Y')
df2 = df1.groupby([df1.index.year, df1.index.month]).sum(1)
df2.index = [ f"{m}/{y}" for (y,m) in df2.index ]
df2 = pd.DataFrame(df2, columns=['sum']).rename_axis('Date')
Upvotes: 1
Reputation: 7
I pasted
ID 03/01/2021 17/01/2021 20/02/2021 03/07/2022 17/07/2022 20/02/2023
ABC 5 6 10 5 5 6
BCD 6 7 3 6 10 4
DEF 10 4 5 2 8 9
into a file some_df.in
. Then this little program gives you the output you want:
import pandas as pd
df = pd.read_csv('some_df.in',sep='\s+')
my = dict()
for d in df.columns[1:]:
day,mo,yr=d.split('/')
key=mo+'/'+yr
if key in my:
my[key].append(d)
else:
my[key]=[d]
print('date sum')
for moyr,col in my.items():
tot=0
for c in col:
tot+=sum(df[d].to_list())
print('{:s} {:d}'.format(moyr,tot))
Upvotes: 0
Reputation: 34056
Use df.stack
with df.groupby
and pd.to_datetime
:
In [419]: x = df.stack().reset_index().query('level_1 != "ID"')
In [426]: x.level_1 = pd.to_datetime(x.level_1, format='%d/%m/%Y')
In [431]: y = x.groupby([x.level_1.dt.year, x.level_1.dt.month])[0].sum()
In [445]: d = [str(j) + '/' + str(i) for i,j in y.index]
In [448]: ans = pd.DataFrame({'date': d, 'sum': y.values})
In [449]: ans
Out[449]:
date sum
0 1/2021 38
1 2/2021 18
2 7/2022 36
3 2/2023 19
Upvotes: 1
Reputation: 438
Try this:
# df = your original dataframe
mth_year_sums = dict()
for date, col in df.iteritems():
mth_year = date.split("/", maxsplit=1)[-1]
count = mth_year_sums.get(mth_year, 0)
mth_year_sums[mth_year] = count + sum(col)
result = pd.DataFrame({'sum': mth_year_sums})
# rename index name to 'date'
result.index.name = 'date'
print(result)
Upvotes: 0