Reputation: 109
I have a large pandas dataframe of email address and wanted to replace all the .edu emails with "Edu". I came up with an highly inefficient way of doing it but there has to be a better way of doing it. This is how I do it:
import pandas as pd
import re
inp = [{'c1':10, 'c2':'gedua.com'}, {'c1':11,'c2':'wewewe.Edu'}, {'c1':12,'c2':'wewewe.edu.ney'}]
dfn = pd.DataFrame(inp)
for index, row in dfn.iterrows():
try:
if len(re.search('\.edu', row['c2']).group(0)) > 1:
dfn.c2[index] = 'Edu'
print('Education')
except:
continue
Upvotes: 2
Views: 1068
Reputation: 403278
Using str.contains
for case insensitive selection, and assignment with loc
.
dfn.loc[dfn.c2.str.contains(r'\.Edu', case=False), 'c2'] = 'Edu'
dfn
c1 c2
0 10 gedua.com
1 11 Edu
2 12 Edu
If it's only the emails ending with .edu
you want to replace, then
dfn.loc[dfn.c2.str.contains(r'\.Edu$', case=False), 'c2'] = 'Edu'
Or, as suggested by piR,
dfn.loc[dfn.c2.str.endswith('.Edu'), 'c2'] = 'Edu'
dfn
c1 c2
0 10 gedua.com
1 11 Edu
2 12 wewewe.edu.ney
Upvotes: 3
Reputation: 294576
replace
dfn.replace('^.*\.Edu$', 'Edu', regex=True)
c1 c2
0 10 gedua.com
1 11 Edu
2 12 wewewe.edu.ney
The pattern '^.*\.Edu$'
says grab everything from the beginning of the string to the point where we find '.Edu'
followed by the end of the string, then replace that whole thing with 'Edu'
You may want to limit the scope to just a column (or columns). You can do that by passing a dictionary to replace
where the outer key specifies the column and the dictionary value specifies what is to be replaced.
dfn.replace({'c2': {'^.*\.Edu$': 'Edu'}}, regex=True)
c1 c2
0 10 gedua.com
1 11 Edu
2 12 wewewe.edu.ney
pandas.DataFrame.replace
does not have a case flag. But you can imbed it in the pattern with '(?i)'
dfn.replace({'c2': {'(?i)^.*\.edu$': 'Edu'}}, regex=True)
c1 c2
0 10 gedua.com
1 11 Edu
2 12 wewewe.edu.ney
Upvotes: 2