Reputation: 1436

Replace all occurrences of a string in a pandas dataframe (Python)

I have a pandas dataframe with about 20 columns.

It is possible to replace all occurrences of a string (here a newline) by manually writing all column names:

df['columnname1'] = df['columnname1'].str.replace("\n","<br>")
df['columnname2'] = df['columnname2'].str.replace("\n","<br>")
df['columnname3'] = df['columnname3'].str.replace("\n","<br>")
...
df['columnname20'] = df['columnname20'].str.replace("\n","<br>")

This unfortunately does not work:

df = df.replace("\n","<br>")

Is there any other, more elegant solution?

Upvotes: 73

Answers (4)

Alex Riley

Reputation: 176830

You can use replace and pass the strings to find/replace as dictionary keys/items:

df.replace({'\n': '<br>'}, regex=True)

For example:

>>> df = pd.DataFrame({'a': ['1\n', '2\n', '3'], 'b': ['4\n', '5', '6\n']})
>>> df
   a    b
0  1\n  4\n
1  2\n  5
2  3    6\n

>>> df.replace({'\n': '<br>'}, regex=True)
   a      b
0  1<br>  4<br>
1  2<br>  5
2  3      6<br>

Note that this method returns a new DataFrame instance by default (it does not modify the original), so you'll need to either reassign the output:

df = df.replace({'\n': '<br>'}, regex=True)

or specify inplace=True:

df.replace({'\n': '<br>'}, regex=True, inplace=True)

Upvotes: 119

Mykola Zotko

Reputation: 17824

You can iterate over all columns and use the method str.replace:

for col in df.columns:
   df[col] = df[col].str.replace('\n', '<br>')

This method uses regex by default.

Upvotes: 6

Jasper Kinoti

Reputation: 499

This will remove all newlines and unecessary spaces. You can edit the ' '.join to specify a replacement character

    df['columnname'] = [''.join(c.split()) for c in df['columnname'].astype(str)]

Upvotes: -1

Yichuan Wang

Reputation: 753

It seems Pandas has change its API to avoid ambiguity when handling regex. Now you should use:

df.replace({'\n': '<br>'}, regex=True)

For example:

>>> df = pd.DataFrame({'a': ['1\n', '2\n', '3'], 'b': ['4\n', '5', '6\n']})
>>> df
   a    b
0  1\n  4\n
1  2\n  5
2  3    6\n

>>> df.replace({'\n': '<br>'}, regex=True)
   a      b
0  1<br>  4<br>
1  2<br>  5
2  3      6<br>

Upvotes: 23

Replace all occurrences of a string in a pandas dataframe (Python)

Answers (4)

Related Questions