SivakumarTG
SivakumarTG

Reputation: 23

Python pandas dataframe get value from previous row and groupby Months,year

I already a group by done with getting previous row values in a data frame.

groupby_result = df.groupby([df['DateAssigned'].dt.strftime('%m').rename('Month')]).agg({'count'}).cumsum().reset_index()

The above lines give me a result like this

 Month   DateAssigned  DateCompleted        
  05        1             0

  06        18            4

So the code gets the cumulative values for the months that are only present. I want the cumulative values to be carried forwarded even if a month doesn't have a record for it.

my expected result :

Month   DateAssigned   DateCompleted   diff
05            1             0           0
06           18             4           17
07           18             4           0
08           20             4           2

4th column diff is to just explain things better

Let me explain my requirement - the 7th month doesn't have any value to it, yet it should get the values from the previous month(6th) and carry forward the value to the 8th month

data frame for reference :

DateAssigned DateCompleted month

2020-06-18    2020-06-19  2020-06-18
2020-06-18           NaT  2020-06-18
2020-06-19           NaT  2020-06-19
2020-06-18    2020-06-18  2020-06-18
2020-06-23           NaT  2020-06-23
2020-06-04           NaT  2020-06-04      
2020-06-18           NaT  2020-06-18
2020-06-18    2020-06-18  2020-06-18
2020-06-05           NaT  2020-06-05
2020-06-18    2020-06-18  2020-06-18
2020-06-05           NaT  2020-06-05
2020-06-05           NaT  2020-06-05
2020-06-17           NaT  2020-06-17
2020-06-18           NaT  2020-06-18
2020-06-17           NaT  2020-06-17
2020-06-18           NaT  2020-06-18
2020-06-04           NaT  2020-06-04
2020-06-05           NaT  2020-06-05

Upvotes: 0

Views: 1480

Answers (3)

r-beginners
r-beginners

Reputation: 35135

Calculate the difference using 'shift()' with the grouped and counted DFs. The output example is an arbitrary result I created, you can replace it with your own.

df["DateAssigned"] = pd.to_datetime(df["DateAssigned"])
df["DateCompleted"] = pd.to_datetime(df["DateCompleted"])
df = df.groupby(df["DateAssigned"].dt.month.rename("Month")).count()
df['diff'] = df['DateAssigned']-df['DateAssigned'].shift()

df

DateAssigned    DateCompleted   month   diff
Month               
5       1   0   1   NaN
6       18  4   18  17.0
7       9   2   9   -9.0
8       12  2   12  3.0

Upvotes: 0

filippo
filippo

Reputation: 5294

This should do, use a proper index to adapt it to your actual dataframe

df.groupby(df["DateAssigned"].dt.month.rename("Month")).count().cumsum().reindex([5,6,7,8]).ffill().reset_index()

Upvotes: 1

PerlBatch
PerlBatch

Reputation: 210

Try using ffill method. This will Forwardfill the previous values if the values are blank or NaN.

df.fillna(method='ffill')

Upvotes: 0

Related Questions