CFD
CFD

Reputation: 686

Replacing an string in a dataframe python

I have a (7,11000) dataframe. in some of these 7 columns, there are strings. In Coulmn 2 and row 1000, there is a string 'London'. I want to change it to 'Paris'. how can I do this? I searched all over the web but I couldnt find a way. I used theses commands but none of them works:

df['column2'].replace('London','Paris')
df['column2'].str.replace('London','Paris')
re.sub('London','Paris',df['column2'])

I usually receive this error:

TypeError: expected string or bytes-like object

Upvotes: 3

Views: 104

Answers (3)

Brandon Bertelsen
Brandon Bertelsen

Reputation: 44638

These are all great answers but many are not vectorized, operating on every item in the series once rather than working on the entire series.

A very reliable filter + replace strategy is to create a mask or subset True/False series and then use loc with that series to replace:

mask = df.country == 'London' 
df.loc[mask, 'country'] = 'Paris'

# On 10m records:
  # this method < 1 second 
  # @Charles method 1 < 10 seconds
  # @Charles method 2 < 3.5 seconds
  # @jose method didn't bother because it would be 30 seconds or more

Upvotes: 0

Charles
Charles

Reputation: 3316

If you want to replace a single row (you mention row 1000), you can do it with .loc. If you want to replace all occurrences of 'London', you could do this:

import pandas as pd
df = pd.DataFrame({'country': ['New York', 'London'],})
df.country = df.country.str.replace('London', 'Paris')

Alternatively, you could write your own replacement function, and then use .apply:

def replace_country(string):
    if string == 'London':
        return 'Paris'
    return string

df.country = df.country.apply(replace_country)

The second method is a bit overkill, but is a good example that generalizes better for more complex tasks.

Upvotes: 3

Jose Angel Sanchez
Jose Angel Sanchez

Reputation: 764

Before replacing check for non characters with re

import re
for r, map in re_map.items():
    df['column2'] = [re.sub(r, map, x) for x in df['column2']]

Upvotes: 0

Related Questions