Tom Holland
Tom Holland

Reputation: 145

How to sum of certain values using pandas datetime operations

Headline is not clear. Let me explain.

I have a dataframe like this:

Order Quantity            Date Accepted        Date Delivered
     20                     01-05-2010            01-02-2011
     10                     01-11-2010            01-03-2011
     300                    01-12-2010            01-04-2011
     5                      01-03-2011            01-03-2012
     20                     01-04-2012            01-11-2013
     10                     01-07-2013            01-12-2014

I want to basically create another column that contains the total undelivered items for each row.

Expected output:

Order Quantity            Date Accepted        Date Delivered      Pending Order
     20                     01-05-2010            01-02-2011             20
     10                     01-11-2010            01-03-2011             30
     300                    01-12-2010            01-04-2011             330
     5                      01-03-2011            01-03-2012             305
     20                     01-04-2012            01-11-2013             20
     10                     01-07-2013            01-12-2014             30

Upvotes: 1

Views: 53

Answers (1)

hemanta
hemanta

Reputation: 1510

Here, I have taken a part of your dataframe and try to get the result.

df = pd.DataFrame({'order': [20, 10, 300, 200], 
                   'Date_aceepted': ['01-05-2010', '01-11-2010', '01-12-2010', '01-12-2010'],
                   'Date_delever': ['01-02-2011', '01-03-2011', '01-04-2011', '01-12-2010']})

  order Date_aceepted Date_delever
0     20    01-05-2010   01-02-2011
1     10    01-11-2010   01-03-2011
2    300    01-12-2010   01-04-2011
3    200    01-12-2010   01-12-2010

Then I will change the Date_accepted and Date_deliver to date time by using pandas data time module

df['date1'] = pd.to_datetime(df['Date_aceepted'])

df['date2'] = pd.to_datetime(df['Date_delever'])

Then I will make a new data frame in which the Date_accepted and Date_delever are not the same. I assume you just need that in your final result.

dff = df[df['date1'] != df['date2']]

You can see the last row in which both accepted and delever are same is now removed in dff.

   order Date_aceepted Date_delever      date1      date2
0     20    01-05-2010   01-02-2011 2010-01-05 2011-01-02
1     10    01-11-2010   01-03-2011 2010-01-11 2011-01-03
2    300    01-12-2010   01-04-2011 2010-01-12 2011-01-04

Then I did use pandas cumsum of pending order

dff['pending'] = dff['order'].cumsum()

and it gives

  order Date_aceepted Date_delever      date1      date2  pending
0     20    01-05-2010   01-02-2011 2010-01-05 2011-01-02       20
1     10    01-11-2010   01-03-2011 2010-01-11 2011-01-03       30
2    300    01-12-2010   01-04-2011 2010-01-12 2011-01-04      330

The final data frame has two extra columns that can be dropped if you don't want in your result.

Upvotes: 2

Related Questions