Reputation: 1568
How do I create a new dataframe bigdf
with a calculated column that iterates through every row from another dataframe df
? I receive empty rows in the new dataframe bigdf
.
# Import pandas library
import pandas as pd
import numpy as np
# DataFrame
data = [['tom', 10], ['nick', 15], ['juli', 14]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])
Iterate through every row to calculate each rows age minus the average
bigdf = pd.DataFrame()
for index, row in df.iterrows():
bigdf['score'] = row['Age']-np.average(df['Age'])
print(bigdf)
Empty DataFrame
Columns: [score]
Index: []
Upvotes: 0
Views: 2004
Reputation: 1728
Your implementation will be very slow. Note that you compute the mean of df['Age']
in every iteration of your for loop - this has quadratic runtime. You should never iterate over rows of a dataframe unless there is really no other option.
In this case, you are simply trying to compute the signed difference between each row's age and the average age, which can be done with a single vectorized operation:
bigdf['score'] = df['Age'] - df['Age'].mean()
Upvotes: 2