fastest way to iterate pandas series/column

Question

I'm more used to for loops but they can become slow in pandas once you get large sets of data. I keep finding iterrows, iter..., etc. examples but want to know if there's a faster way. What I currently have now is

newnames = []
names = df['name'].tolist()
for i in names:
  i = i.replace(' ','_')
  newnames.append(i)

and then I could add the newnames list to the df as a pandas column OR should I rewrite the existing df['name'] values in place? Not too familiar with pandas best practices so I welcome all feedback. Thanks

SeaBean · Accepted Answer

If you finally want to add the newnames to df, you could do it directly by:

df['newnames'] = df['name'].str.replace(' ', '_')

If you just want to change name column to replace all spaces by _, you can also do it directly on the original column (overwrite it), as follows:

df['name'] = df['name'].str.replace(' ', '_')

In both ways, we are doing it using Pandas' vectorized operation which has been optimized for faster execution, instead of using looping which has not been optimized and is slow.

fastest way to iterate pandas series/column

Answers (2)

Related Questions