Reputation: 23
I already a group by done with getting previous row values in a data frame.
groupby_result = df.groupby([df['DateAssigned'].dt.strftime('%m').rename('Month')]).agg({'count'}).cumsum().reset_index()
The above lines give me a result like this
Month DateAssigned DateCompleted
05 1 0
06 18 4
So the code gets the cumulative values for the months that are only present. I want the cumulative values to be carried forwarded even if a month doesn't have a record for it.
my expected result :
Month DateAssigned DateCompleted diff
05 1 0 0
06 18 4 17
07 18 4 0
08 20 4 2
4th column diff is to just explain things better
Let me explain my requirement - the 7th month doesn't have any value to it, yet it should get the values from the previous month(6th) and carry forward the value to the 8th month
data frame for reference :
DateAssigned DateCompleted month
2020-06-18 2020-06-19 2020-06-18
2020-06-18 NaT 2020-06-18
2020-06-19 NaT 2020-06-19
2020-06-18 2020-06-18 2020-06-18
2020-06-23 NaT 2020-06-23
2020-06-04 NaT 2020-06-04
2020-06-18 NaT 2020-06-18
2020-06-18 2020-06-18 2020-06-18
2020-06-05 NaT 2020-06-05
2020-06-18 2020-06-18 2020-06-18
2020-06-05 NaT 2020-06-05
2020-06-05 NaT 2020-06-05
2020-06-17 NaT 2020-06-17
2020-06-18 NaT 2020-06-18
2020-06-17 NaT 2020-06-17
2020-06-18 NaT 2020-06-18
2020-06-04 NaT 2020-06-04
2020-06-05 NaT 2020-06-05
Upvotes: 0
Views: 1480
Reputation: 35135
Calculate the difference using 'shift()' with the grouped and counted DFs. The output example is an arbitrary result I created, you can replace it with your own.
df["DateAssigned"] = pd.to_datetime(df["DateAssigned"])
df["DateCompleted"] = pd.to_datetime(df["DateCompleted"])
df = df.groupby(df["DateAssigned"].dt.month.rename("Month")).count()
df['diff'] = df['DateAssigned']-df['DateAssigned'].shift()
df
DateAssigned DateCompleted month diff
Month
5 1 0 1 NaN
6 18 4 18 17.0
7 9 2 9 -9.0
8 12 2 12 3.0
Upvotes: 0
Reputation: 5294
This should do, use a proper index to adapt it to your actual dataframe
df.groupby(df["DateAssigned"].dt.month.rename("Month")).count().cumsum().reindex([5,6,7,8]).ffill().reset_index()
Upvotes: 1
Reputation: 210
Try using ffill method. This will Forwardfill the previous values if the values are blank or NaN.
df.fillna(method='ffill')
Upvotes: 0