Reputation: 117
I have a dataframe with values as below-
Amount Product DocDate
0 1099.0 1100 2018-01-02
1 1234.0 1100 2018-01-04
2 1000.0 1100 2018-01-06
3 8000.0 1100 2018-01-28
4 3000.0 1100 2018-02-09
5 4500.0 1100 2018-02-20
I need to calculate the sum of amount field calculated on end of each fortnight.
Example:
Product Amount FortNight
1100 3333.0 Jan 1st Fortnight (this could be date format too!!)
1100 3000.0 Feb 2nd Fortnight
1100 4500.0 Feb 1st Fortnight
It would be great if it did not involve loops( since I have loops for product too) I tried separating (grouping) both week and month and calculating, but couldn't get expected results since its a column in dataframe.
Thanks in advance. Appreciate help.
Upvotes: 2
Views: 1237
Reputation: 862581
Need:
#if necessary convert column to datetime
df['DocDate'] = pd.to_datetime(df['DocDate'])
#generate Fortnight https://stackoverflow.com/a/34428879
s = np.where(df['DocDate'].dt.day < 15, '1st Fortnight', '2nd Fortnight')
#create new column
df['FortNight'] = df['DocDate'].dt.strftime('%b ') + s
#aggregate sum
df = df.groupby(['Product','FortNight'], as_index=False, sort=False)['Amount'].sum()
print (df)
Product FortNight Amount
0 1100 Jan 1st Fortnight 3333.0
1 1100 Jan 2nd Fortnight 8000.0
2 1100 Feb 1st Fortnight 3000.0
3 1100 Feb 2nd Fortnight 4500.0
If need datetimes in output:
s = np.where(df['DocDate'].dt.day < 15, '-01', '-15')
df['FortNight'] = pd.to_datetime(df['DocDate'].dt.strftime('%Y-%m') + s)
df = df.groupby(['Product','FortNight'], as_index=False, sort=False)['Amount'].sum()
print (df)
Product FortNight Amount
0 1100 2018-01-01 3333.0
1 1100 2018-01-15 8000.0
2 1100 2018-02-01 3000.0
3 1100 2018-02-15 4500.0
Upvotes: 1
Reputation: 846
First of all, I am a little worried since you said you were using loops with pandas... You should NEVER use loops with pandas they are not made for it and are extremely slow and ineffective. There are some really rare cases where you cannot avoid it, but even then there are options to optimize it better.
To answer your question, you first need to convert the DocDate to datetime format:
from datetime import datetime
df.DocDate = df.DocDate.apply(lambda d: datetime.strptime(d, %Y-%m-%d))
You can then use the datetimeIndex.resample
function which works exactly like groupby but allows you to group data by time constraint :
df = df.set_index('DocDate').resample('2W').Amount.sum()
The resample('2W')
here means group by 2 weeks periods.
Upvotes: 2