thePandasFriend
thePandasFriend

Reputation: 113

Best way to iterate over a Pandas Dataframe?

I have a dataframe like so:

x  y  someVal someOtherVal
1  2  hello   heyhey
2  1  hello   heyhey

and want to iterate over each row to check if x < y, then append 'LT' to someVal to achieve this:

x  y  someVal someOtherVal
1  2  helloLT heyhey
2  1  hello   heyhey

I read in the documentation that iterating is not good practice and can lead to incorrect results, so I'm not sure what to do.

Upvotes: 1

Views: 163

Answers (2)

timgeb
timgeb

Reputation: 78690

Usually explicit iteration can and should be avoided. The internal vectorized operations are much faster than Python for loops.

In this specific case, use

df.loc[df['x'] < df['y'], 'someVal'] += 'LT'

Thanks! Is there a way to add the condition to only trigger the less than check if someVal == someOtherVal?

df.loc[(df['x'] < df['y']) & (df['someVal'] == df['someOtherVal']), 'someVal'] += 'LT'

or

df.loc[df['x'].lt(df['y']) & df['someVal'].eq(df['someOtherVal']), 'someVal'] += 'LT'

Upvotes: 4

Scott Boston
Scott Boston

Reputation: 153460

You can try this:

df['someVal'] = df['someVal'] + np.where(df['x']<df['y'],'LT','')

Output:

   x  y  someVal someOtherVal
0  1  2  helloLT       heyhey
1  2  1    hello       heyhey

pandas will do this all in one vectorized step using index alignment.

Upvotes: 2

Related Questions