EMC
EMC

Reputation: 749

String wildcard in pandas on replace function

I'm sure this problem has an easy answer but I'm having trouble figuring out the correct string to use. I basically want to replace any email address in a data frame with the new domain. For a specific column, replace the substring '@*' where * is any set of characters with '@newcompany.com'. I want to keep whatever comes prior to the @ as is. Thanks all.

df_users['EMAIL'] = df_users['EMAIL'].str.replace('@', '@newcompany.com')

Upvotes: 2

Views: 9820

Answers (2)

EdChum
EdChum

Reputation: 394159

You can use the vectorised str method to split on '@' character and then join the left side with the new domain name:

In [42]:

df = pd.DataFrame({'email':['[email protected]', '[email protected]', '[email protected]']})
df
Out[42]:
                email
0      [email protected]
1    [email protected]
2  [email protected]

In [43]:

df['email'] = df.email.str.split('@').str[0] + '@newcompany.com'
df

Out[43]:
                     email
0    [email protected]
1     [email protected]
2  [email protected]

another method is to call the vectorised replace which accepts a regex as a pattern on the strings:

In [56]:

df['email'] = df['email'].str.replace(r'@.+', '@newcompany.com')
df
Out[56]:
                     email
0    [email protected]
1     [email protected]
2  [email protected]

Timings

In [58]:

%timeit df['email'] = df['email'].str.replace(r'@.+', '@newcompany.com')
1000 loops, best of 3: 632 µs per loop
In [60]:

%timeit df['email'] = df.email.str.split('@').str[0] + '@newcompany.com'
1000 loops, best of 3: 1.66 ms per loop

In [63]:

%timeit df['email'] = df['email'].replace(r'@.+', '@newcompany.com', regex=True)
1000 loops, best of 3: 738 µs per loop

Here we can see that the str.replace regex version is nearly 3x faster than the split method, interestingly the Series.replace method which would seem to be doing the same thing as the str.replace is slower.

Upvotes: 4

mattvivier
mattvivier

Reputation: 2291

This sounds like a job for regex! Pandas' replace will let you use regular expressions, you just have to set it to true. You're most of the way there, the following should work for you.

df_users['EMAIL'].replace('@.*$', '@newcompany.com', inplace=True, regex=True)

Upvotes: 3

Related Questions