Tom Holland
Tom Holland

Reputation: 145

How to convert this forloop to pandas lambda function, to increase speed

This forloop will take 3 days to complete. How can I increase the speed?

for i in range(df.shape[0]):
    df.loc[df['Creation date'] >= pd.to_datetime(str(df['Original conf GI dte'].iloc[i])),'delivered'] += df['Sale order item'].iloc[i]

I think the forloop is enough to understand?

If Creation date is bigger than Original conf GI date, then add Sale order item value to delivered column.

Each row's date is "Date Accepted" (Date Delivered is future date). Input is Order Ouantity, Date Accepted & Date Delivered....Output is Delivered column

Order Quantity            Date Accepted        Date Delivered       Delivered
     20                     01-05-2010            01-02-2011           0
     10                     01-11-2010            01-03-2011           0
     300                    01-12-2010            01-09-2011           0     
     5                      01-03-2011            01-03-2012           30
     20                     01-04-2012            01-11-2013           335
     10                     01-07-2013            01-12-2014           335

Upvotes: 2

Views: 90

Answers (2)

jezrael
jezrael

Reputation: 862591

Convert values to numpy arrays by Series.to_numpy, compare them with broadcasting, match order values by numpy.where and last sum:

date1 = df['Date Accepted'].to_numpy()
date2 = df['Date Delivered'].to_numpy()
order = df['Order Quantity'].to_numpy()

#oldier pandas versions
#date1 = df['Date Accepted'].values
#date2 = df['Date Delivered'].values
#order = df['Order Quantity'].values

df['Delivered1'] = np.where(date1[:, None] >= date2, order, 0).sum(axis=1)
print (df)
   Order Quantity Date Accepted Date Delivered  Delivered  Delivered1
0              20    2010-01-05     2011-01-02          0           0
1              10    2010-01-11     2011-01-03          0           0
2             300    2010-01-12     2011-01-09          0           0
3               5    2011-01-03     2012-01-03         30          30
4              20    2012-01-04     2013-01-11        335         335
5              10    2013-01-07     2014-01-12        335         335

Upvotes: 3

anky
anky

Reputation: 75080

If I understand correctly, you can use np.where() for speed. Currently you are looping on the dataframe rows whereas numpy operations are designed to operate on the entire column:

cond= df['Creation date'].ge(pd.to_datetime(str(df['Original conf GI dte'])))
df['delivered']=np.where(cond,df['delivered']+df['Sale order item'],df['delivered'])

Upvotes: 1

Related Questions