marlon
marlon

Reputation: 7663

Is there a way to make changing DataFrame faster in a loop?

    for index, row in df.iterrows():
        print(index)

        name = row['name']
        new_name = get_name(name)
        row['new_name'] = new_name

        df.loc[index] = row

In this piece of code, my testing shows that the last line makes it quite slow, really slow. It basically insert a new column row by row. Maybe I should store all the 'new_name' into a list, and update the df outside of the loop?

Upvotes: 2

Views: 33

Answers (2)

jezrael
jezrael

Reputation: 862831

Use Series.apply for processing function for each value of column, it is faster like iterrows:

df['new_name'] = df['name'].apply(get_name)

If want improve performance then is necessary change function if possible, but it depends of function.

Upvotes: 1

Gulzar
Gulzar

Reputation: 27966

df['new_name'] = df.apply(lambda x: get_name(x) if x.name == 'name' else x)

.apply isn't a best practice, however I am not sure there is a better one here.

Upvotes: 0

Related Questions