Reputation: 749
I'm sure this problem has an easy answer but I'm having trouble figuring out the correct string to use. I basically want to replace any email address in a data frame with the new domain. For a specific column, replace the substring '@*' where * is any set of characters with '@newcompany.com'. I want to keep whatever comes prior to the @ as is. Thanks all.
df_users['EMAIL'] = df_users['EMAIL'].str.replace('@', '@newcompany.com')
Upvotes: 2
Views: 9820
Reputation: 394159
You can use the vectorised str
method to split on '@'
character and then join the left side with the new domain name:
In [42]:
df = pd.DataFrame({'email':['[email protected]', '[email protected]', '[email protected]']})
df
Out[42]:
email
0 [email protected]
1 [email protected]
2 [email protected]
In [43]:
df['email'] = df.email.str.split('@').str[0] + '@newcompany.com'
df
Out[43]:
email
0 [email protected]
1 [email protected]
2 [email protected]
another method is to call the vectorised replace
which accepts a regex as a pattern on the strings:
In [56]:
df['email'] = df['email'].str.replace(r'@.+', '@newcompany.com')
df
Out[56]:
email
0 [email protected]
1 [email protected]
2 [email protected]
Timings
In [58]:
%timeit df['email'] = df['email'].str.replace(r'@.+', '@newcompany.com')
1000 loops, best of 3: 632 µs per loop
In [60]:
%timeit df['email'] = df.email.str.split('@').str[0] + '@newcompany.com'
1000 loops, best of 3: 1.66 ms per loop
In [63]:
%timeit df['email'] = df['email'].replace(r'@.+', '@newcompany.com', regex=True)
1000 loops, best of 3: 738 µs per loop
Here we can see that the str.replace
regex version is nearly 3x faster than the split
method, interestingly the Series.replace method which would seem to be doing the same thing as the str.replace
is slower.
Upvotes: 4
Reputation: 2291
This sounds like a job for regex! Pandas' replace
will let you use regular expressions, you just have to set it to true. You're most of the way there, the following should work for you.
df_users['EMAIL'].replace('@.*$', '@newcompany.com', inplace=True, regex=True)
Upvotes: 3