Fazia Chenna
Fazia Chenna

Reputation: 135

Lookup from multiple columns python

I have the following dataframe df:

       ID      03/01/2021     17/01/2021  20/02/2021  03/07/2022   17/07/2022  

  0   ABC             5               6          10           5            5            
  1   BCD             6               7           3           6           10              
  2   DEF            10               4           5           2            8              

I would like to look up for the columns that have the same month and year ,and save the sum of the values in a new dataframe.

Output would be:

date       sum 
01/2021     38     
02/2021     18
07/2022     36
02/2023     19

I do appreciate your help!

Upvotes: 3

Views: 1768

Answers (4)

Pygirl
Pygirl

Reputation: 13349

try transposing the dataframe and then group by year and month.

df = df.drop('ID', axis=1).T
df.index = pd.to_datetime(df.index, format='%d/%m/%Y')
df2 = df.groupby([df.index.year, df.index.month]).sum(1).sum(1)
df2.index = [ f"{m}/{y}" for (y,m) in df2.index ]
df2 = pd.DataFrame({'Date': df2.index, 'sum': df2.values})

df2:

Date    sum
1/2021  38
2/2021  18
7/2022  36
2/2023  19

OR

1st sum up the date column values then groupby and calculate sum.

df1 = df.drop('ID', axis=1).sum(0)
df1.index = pd.to_datetime(df1.index, format='%d/%m/%Y')
df2 = df1.groupby([df1.index.year, df1.index.month]).sum(1)
df2.index = [ f"{m}/{y}" for (y,m) in df2.index ]
df2 = pd.DataFrame(df2, columns=['sum']).rename_axis('Date')

Upvotes: 1

Cameron Abrams
Cameron Abrams

Reputation: 7

I pasted

ID      03/01/2021     17/01/2021  20/02/2021  03/07/2022   17/07/2022  20/02/2023
ABC             5               6          10           5            5           6 
BCD             6               7           3           6           10           4     
DEF            10               4           5           2            8           9     

into a file some_df.in. Then this little program gives you the output you want:

import pandas as pd
df = pd.read_csv('some_df.in',sep='\s+')
my = dict()
for d in df.columns[1:]:
    day,mo,yr=d.split('/')
    key=mo+'/'+yr
    if key in my:
        my[key].append(d)
    else:
        my[key]=[d]
print('date      sum')
for moyr,col in my.items():
    tot=0
    for c in col:
        tot+=sum(df[d].to_list())
    print('{:s}    {:d}'.format(moyr,tot))

Upvotes: 0

Mayank Porwal
Mayank Porwal

Reputation: 34056

Use df.stack with df.groupby and pd.to_datetime:

In [419]: x = df.stack().reset_index().query('level_1 != "ID"')
In [426]: x.level_1 = pd.to_datetime(x.level_1, format='%d/%m/%Y')

In [431]: y = x.groupby([x.level_1.dt.year, x.level_1.dt.month])[0].sum()
In [445]: d = [str(j) + '/' + str(i) for i,j in y.index]

In [448]: ans = pd.DataFrame({'date': d, 'sum': y.values})

In [449]: ans
Out[449]: 
     date  sum
0  1/2021   38
1  2/2021   18
2  7/2022   36
3  2/2023   19

Upvotes: 1

MusHusKat
MusHusKat

Reputation: 438

Try this:

# df = your original dataframe 

mth_year_sums = dict()

for date, col in df.iteritems():
    mth_year = date.split("/", maxsplit=1)[-1]
    count = mth_year_sums.get(mth_year, 0)
    mth_year_sums[mth_year] = count + sum(col)

result = pd.DataFrame({'sum': mth_year_sums})

# rename index name to 'date'
result.index.name = 'date'
print(result)

Upvotes: 0

Related Questions