ah bon
ah bon

Reputation: 10051

Create multiple possible email addresses based on names in Python

Given a dataframe as follows:

  firstname   lastname                     email_address  \
0      Doug     Watson  [email protected]   
1      Nick   Holekamp    [email protected]   
2       Rob  Schreiner        [email protected]   
3    Austin   Phillips       [email protected]   
4     Elise     Geiger               [email protected]   
5      Paul      Urick       [email protected]   
6   Michael   Obringer    [email protected]   
7     Craig   Heneghan           [email protected]   
8     Kathy      Hirst       [email protected]   
9    Stefan  Bluemmers   [email protected]   

                               companyname  
0                           Dignity Health  
1  Ranken Jordan Pediatric Bridge Hospital  
2                   WellStar Health System  
3         Precision Medical Products, Inc.  
4                              puracap.com  
5              Diplomat Specialty Pharmacy  
6                               Lash Group  
7                West-Ward Pharmaceuticals  
8                 Sunovion Pharmaceuticals  
9                         Grünenthal Group  

How could I create possible email addresses using common email patterns as such: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], etc.

df['email1'] = df.firstname.str.lower() + '.' + df.lastname.str.lower() + '@' + df.companyname.str.replace('\s+', '').str.lower() + '.com'
print(df['email1'])

Out:

0                           [email protected]
1       nick.holekamp@rankenjordanpediatricbridgehospi...  --->problematic
2                  [email protected]
3       austin.phillips@precisionmedicalproducts,inc..com  --->problematic
4                            [email protected]  --->problematic
                              ...                        
9995              [email protected]
9996                          [email protected]
9997                               [email protected]
9998                     [email protected]
9999                              [email protected]

Some of them seems quite problematic, anyone could help to solve this issue? Thanks a lot.

EDITED:

print(df) after applying @Sajith Herath's solution:

Out:

  firstname  lastname                                        companyname  \
0      Nick  Holekamp  Ranken                                        ...   

                                               email  
0                       nick.                    ...  

Upvotes: 1

Views: 867

Answers (1)

Sajith Herath
Sajith Herath

Reputation: 1052

You can use a method to create permutations of username with different separators and define a max length that simplify the domain using company name as follows

import pandas as pd
import random

data = {"firstname":["Nick"],"lastname":["Holekamp"],"companyname":["Ranken \
                                        Jordan Pediatric Bridge Hospital"]}
df = pd.DataFrame(data=data)

max_char = 5
emails = []

def simplify_domain(text):
    if len(text)>max_char:
        text = ''.join([c for c in text if c.isupper()])
        return text.lower()
    return text.replace("\s+","").lower()

def username_permutations(first_name,last_name):
  # define separators 
  separators = [".", "_", "-"]
  #lower case
  combinations = list(map(lambda x:f"{first_name.lower()}{x} \
                           {last_name.lower()}",separators))

  #append a random number to tail
  n = random.randint(1, 100) 
  combinations.extend(list(map(lambda x:f"{x}{n}",combinations)))
  return combinations

for index,row in df.iterrows():
    usernames = username_permutations(row["firstname"],row["lastname"])
    email_permutations = list(map(lambda x: f" \
                    {x}@{simplify_domain(row['companyname'])}.com",usernames))
    emails.append(','.join(email_permutations))

df["email"] = emails

Final result will be [email protected],[email protected],[email protected],[email protected],[email protected],[email protected]

you can modify simplify_domain method to validate given string such as removing inc or .com values

Upvotes: 1

Related Questions