Reputation: 686
I have a (7,11000) dataframe. in some of these 7 columns, there are strings. In Coulmn 2 and row 1000, there is a string 'London'. I want to change it to 'Paris'. how can I do this? I searched all over the web but I couldnt find a way. I used theses commands but none of them works:
df['column2'].replace('London','Paris')
df['column2'].str.replace('London','Paris')
re.sub('London','Paris',df['column2'])
I usually receive this error:
TypeError: expected string or bytes-like object
Upvotes: 3
Views: 104
Reputation: 44638
These are all great answers but many are not vectorized, operating on every item in the series once rather than working on the entire series.
A very reliable filter + replace strategy is to create a mask or subset True/False series and then use loc with that series to replace:
mask = df.country == 'London'
df.loc[mask, 'country'] = 'Paris'
# On 10m records:
# this method < 1 second
# @Charles method 1 < 10 seconds
# @Charles method 2 < 3.5 seconds
# @jose method didn't bother because it would be 30 seconds or more
Upvotes: 0
Reputation: 3316
If you want to replace a single row (you mention row 1000), you can do it with .loc
. If you want to replace all occurrences of 'London'
, you could do this:
import pandas as pd
df = pd.DataFrame({'country': ['New York', 'London'],})
df.country = df.country.str.replace('London', 'Paris')
Alternatively, you could write your own replacement function, and then use .apply
:
def replace_country(string):
if string == 'London':
return 'Paris'
return string
df.country = df.country.apply(replace_country)
The second method is a bit overkill, but is a good example that generalizes better for more complex tasks.
Upvotes: 3
Reputation: 764
Before replacing check for non characters with re
import re
for r, map in re_map.items():
df['column2'] = [re.sub(r, map, x) for x in df['column2']]
Upvotes: 0