Reputation: 7663
for index, row in df.iterrows():
print(index)
name = row['name']
new_name = get_name(name)
row['new_name'] = new_name
df.loc[index] = row
In this piece of code, my testing shows that the last line makes it quite slow, really slow. It basically insert a new column row by row. Maybe I should store all the 'new_name' into a list, and update the df outside of the loop?
Upvotes: 2
Views: 33
Reputation: 862831
Use Series.apply
for processing function for each value of column, it is faster like iterrows
:
df['new_name'] = df['name'].apply(get_name)
If want improve performance then is necessary change function if possible, but it depends of function.
Upvotes: 1
Reputation: 27966
df['new_name'] = df.apply(lambda x: get_name(x) if x.name == 'name' else x)
.apply
isn't a best practice, however I am not sure there is a better one here.
Upvotes: 0