Alina
Alina

Reputation: 61

How to speed up nested for loop with dataframe?

I have a dataframe like this:

test = pd.DataFrame({'id':['a','C','D','b','b','D','c','c','c'], 'text':['a','x','a','b','b','b','c','c','c']})

Using the following for-loop I can add x to a new_col. This for-loop works fine for the small dataframe. However, for dataframes that have thousands of rows, it will take many hours to process. Any suggestions to speed it up?

for index, row in test.iterrows():
    if row['id'] == 'C':
        if test['id'][index+1] =='D':
            test['new_col'][index+1] = test['text'][index]

Upvotes: 0

Views: 101

Answers (1)

Vikas Periyadath
Vikas Periyadath

Reputation: 3186

Try using shift() and conditions.

import pandas as pd
import numpy as np

df = pd.DataFrame({'id': ['a', 'C', 'D', 'b', 'b', 'D', 'c', 'c', 'c'], 
                   'text': ['a', 'x', 'a', 'b', 'b', 'b', 'c', 'c', 'c']})


df['temp_col'] = df['id'].shift()
df['new_col'] = np.where((df['id'] == 'D') & (df['temp_col'] == 'C'), df['text'].shift(), "")
del df['temp_col']
print(df)

We can also do it without a temporary column. (Thanks& credits to Prayson 🙂)

df['new_col'] = np.where((df['id'].eq('D')) & (df['id'].shift().eq('C')), df['text'].shift(), "")

Upvotes: 2

Related Questions