Starbucks
Starbucks

Reputation: 1568

Simple Calculation Iterating through Every Row - Pandas

How do I create a new dataframe bigdf with a calculated column that iterates through every row from another dataframe df? I receive empty rows in the new dataframe bigdf.

# Import pandas library
import pandas as pd
import numpy as np
  
# DataFrame
data = [['tom', 10], ['nick', 15], ['juli', 14]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])

Iterate through every row to calculate each rows age minus the average

bigdf = pd.DataFrame()
for index, row in df.iterrows():
    bigdf['score'] = row['Age']-np.average(df['Age'])

print(bigdf)

Empty DataFrame
Columns: [score]
Index: []

Upvotes: 0

Views: 2004

Answers (1)

Andrew Eckart
Andrew Eckart

Reputation: 1728

Your implementation will be very slow. Note that you compute the mean of df['Age'] in every iteration of your for loop - this has quadratic runtime. You should never iterate over rows of a dataframe unless there is really no other option.

In this case, you are simply trying to compute the signed difference between each row's age and the average age, which can be done with a single vectorized operation:

bigdf['score'] = df['Age'] - df['Age'].mean()

Upvotes: 2

Related Questions