Sparsha Devapalli
Sparsha Devapalli

Reputation: 117

Python Pandas Dataframe- calculate the sum of values fortnightly

I have a dataframe with values as below-

               Amount  Product  DocDate
0              1099.0   1100 2018-01-02
1              1234.0   1100 2018-01-04
2              1000.0   1100 2018-01-06
3              8000.0   1100 2018-01-28
4              3000.0   1100 2018-02-09
5              4500.0   1100 2018-02-20

I need to calculate the sum of amount field calculated on end of each fortnight.

Example:

Product Amount FortNight
1100    3333.0  Jan 1st Fortnight (this could be date format too!!)
1100    3000.0  Feb 2nd Fortnight
1100    4500.0  Feb 1st Fortnight

It would be great if it did not involve loops( since I have loops for product too) I tried separating (grouping) both week and month and calculating, but couldn't get expected results since its a column in dataframe.

Thanks in advance. Appreciate help.

Upvotes: 2

Views: 1237

Answers (2)

jezrael
jezrael

Reputation: 862581

Need:

#if necessary convert column to datetime
df['DocDate'] = pd.to_datetime(df['DocDate'])
#generate Fortnight https://stackoverflow.com/a/34428879
s =  np.where(df['DocDate'].dt.day < 15, '1st Fortnight', '2nd Fortnight')
#create new column
df['FortNight'] = df['DocDate'].dt.strftime('%b ') + s
#aggregate sum
df = df.groupby(['Product','FortNight'], as_index=False, sort=False)['Amount'].sum()
print (df)
   Product          FortNight  Amount
0     1100  Jan 1st Fortnight  3333.0
1     1100  Jan 2nd Fortnight  8000.0
2     1100  Feb 1st Fortnight  3000.0
3     1100  Feb 2nd Fortnight  4500.0

If need datetimes in output:

s =  np.where(df['DocDate'].dt.day < 15, '-01', '-15')
df['FortNight'] = pd.to_datetime(df['DocDate'].dt.strftime('%Y-%m') + s)

df = df.groupby(['Product','FortNight'], as_index=False, sort=False)['Amount'].sum()
print (df)
   Product  FortNight  Amount
0     1100 2018-01-01  3333.0
1     1100 2018-01-15  8000.0
2     1100 2018-02-01  3000.0
3     1100 2018-02-15  4500.0

Upvotes: 1

Zuma
Zuma

Reputation: 846

First of all, I am a little worried since you said you were using loops with pandas... You should NEVER use loops with pandas they are not made for it and are extremely slow and ineffective. There are some really rare cases where you cannot avoid it, but even then there are options to optimize it better.

To answer your question, you first need to convert the DocDate to datetime format:

from datetime import datetime
df.DocDate = df.DocDate.apply(lambda d: datetime.strptime(d, %Y-%m-%d))

You can then use the datetimeIndex.resample function which works exactly like groupby but allows you to group data by time constraint :

df = df.set_index('DocDate').resample('2W').Amount.sum()

The resample('2W') here means group by 2 weeks periods.

Upvotes: 2

Related Questions