How to speed up nested for loop with dataframe?

Question

I have a dataframe like this:

test = pd.DataFrame({'id':['a','C','D','b','b','D','c','c','c'], 'text':['a','x','a','b','b','b','c','c','c']})

Using the following for-loop I can add x to a new_col. This for-loop works fine for the small dataframe. However, for dataframes that have thousands of rows, it will take many hours to process. Any suggestions to speed it up?

for index, row in test.iterrows():
    if row['id'] == 'C':
        if test['id'][index+1] =='D':
            test['new_col'][index+1] = test['text'][index]

Vikas Periyadath · Accepted Answer

Try using shift() and conditions.

import pandas as pd
import numpy as np

df = pd.DataFrame({'id': ['a', 'C', 'D', 'b', 'b', 'D', 'c', 'c', 'c'], 
                   'text': ['a', 'x', 'a', 'b', 'b', 'b', 'c', 'c', 'c']})


df['temp_col'] = df['id'].shift()
df['new_col'] = np.where((df['id'] == 'D') & (df['temp_col'] == 'C'), df['text'].shift(), "")
del df['temp_col']
print(df)

We can also do it without a temporary column. (Thanks& credits to Prayson 🙂)

df['new_col'] = np.where((df['id'].eq('D')) & (df['id'].shift().eq('C')), df['text'].shift(), "")

How to speed up nested for loop with dataframe?

Answers (1)

Related Questions