Create multiple possible email addresses based on names in Python

Question

Given a dataframe as follows:

  firstname   lastname                     email_address  \
0      Doug     Watson  douglas.watson@dignityhealth.org   
1      Nick   Holekamp    nick.holekamp@rankenjordan.org   
2       Rob  Schreiner        rob.schriener@wellstar.org   
3    Austin   Phillips       austin.phillips@precmed.com   
4     Elise     Geiger               egeiger@puracap.com   
5      Paul      Urick       purick@diplomatpharmacy.com   
6   Michael   Obringer    michael.obringer@lashgroup.com   
7     Craig   Heneghan           cheneghan@west-ward.com   
8     Kathy      Hirst       kathleen.hirst@sunovion.com   
9    Stefan  Bluemmers   stefan.bluemmers@grunenthal.com   

                               companyname  
0                           Dignity Health  
1  Ranken Jordan Pediatric Bridge Hospital  
2                   WellStar Health System  
3         Precision Medical Products, Inc.  
4                              puracap.com  
5              Diplomat Specialty Pharmacy  
6                               Lash Group  
7                West-Ward Pharmaceuticals  
8                 Sunovion Pharmaceuticals  
9                         Grünenthal Group

How could I create possible email addresses using common email patterns as such: firstlast@example.com, first.last@example.com, f.last@example.com, lastF@example.com, first_last@example.com, firstL@example.com, etc.

df['email1'] = df.firstname.str.lower() + '.' + df.lastname.str.lower() + '@' + df.companyname.str.replace('\s+', '').str.lower() + '.com'
print(df['email1'])

Out:

0                           doug.watson@dignityhealth.com
1       nick.holekamp@rankenjordanpediatricbridgehospi...  --->problematic
2                  rob.schreiner@wellstarhealthsystem.com
3       austin.phillips@precisionmedicalproducts,inc..com  --->problematic
4                            elise.geiger@puracap.com.com  --->problematic
                              ...                        
9995              terry.hanley@kempersportsmanagement.com
9996                          christine.marks@geocomp.com
9997                               darryl.rickner@doe.com
9998                     lalit.sharma@lovelylifestyle.com
9999                              parul.dutt@infibeam.com

Some of them seems quite problematic, anyone could help to solve this issue? Thanks a lot.

EDITED:

print(df) after applying @Sajith Herath's solution:

Out:

  firstname  lastname                                        companyname  \
0      Nick  Holekamp  Ranken                                        ...   

                                               email  
0                       nick.                    ...

Sajith Herath · Accepted Answer

You can use a method to create permutations of username with different separators and define a max length that simplify the domain using company name as follows

import pandas as pd
import random

data = {"firstname":["Nick"],"lastname":["Holekamp"],"companyname":["Ranken \
                                        Jordan Pediatric Bridge Hospital"]}
df = pd.DataFrame(data=data)

max_char = 5
emails = []

def simplify_domain(text):
    if len(text)>max_char:
        text = ''.join([c for c in text if c.isupper()])
        return text.lower()
    return text.replace("\s+","").lower()

def username_permutations(first_name,last_name):
  # define separators 
  separators = [".", "_", "-"]
  #lower case
  combinations = list(map(lambda x:f"{first_name.lower()}{x} \
                           {last_name.lower()}",separators))

  #append a random number to tail
  n = random.randint(1, 100) 
  combinations.extend(list(map(lambda x:f"{x}{n}",combinations)))
  return combinations

for index,row in df.iterrows():
    usernames = username_permutations(row["firstname"],row["lastname"])
    email_permutations = list(map(lambda x: f" \
                    {x}@{simplify_domain(row['companyname'])}.com",usernames))
    emails.append(','.join(email_permutations))

df["email"] = emails

Final result will be nick.holekamp@rjpbh.com,nick_holekamp@rjpbh.com,nick-holekamp@rjpbh.com,nick.holekamp66@rjpbh.com,nick_holekamp66@rjpbh.com,nick-holekamp66@rjpbh.com

you can modify simplify_domain method to validate given string such as removing inc or .com values

Create multiple possible email addresses based on names in Python

Answers (1)

Related Questions