Ben Swann
Ben Swann

Reputation: 133

Extract domain names from multiple email addresses in Data Frame

I am trying to extract multiple domain names from the following data frame:

    email
0   [email protected]; [email protected]
1   [email protected]; [email protected]
2   [email protected]

I can split and extract the first email address using the following code:

orig = []
mylist = []
for i in df['email']:
    orig.append(i)
    i = i[ i.find("@") : ]
    i = i.split(";")
    i = ';'.join(i)
    mylist.append(i)

After appending the lists to a data frame I get the following result:

    origemail                           newemail
0   [email protected]; [email protected]  @gmail1.com; [email protected]
1   [email protected]; [email protected]  @gmail3.com; [email protected]
2   [email protected]  @gmail5.com

The result I am after: (these email addresses may not be limited to two, it could be more.)

    origemail                           newemail
0   [email protected]; [email protected]  @gmail1.com; @gmail2.com
1   [email protected]; [email protected]  @gmail3.com; @gmail4.com
2   [email protected]                    @gmail5.com

Can someone please point me in the right direction to achieve the desired output? Thanks in advance.

Upvotes: 0

Views: 658

Answers (2)

Lee Garcon
Lee Garcon

Reputation: 192

Something like this should work:

orig = []
mylist = []
for i in df['email']:
    orig.append(i)
    emails = i.strip().split(';')
    domains = [x[x.find('@'):] for x in emails]
    if len(domains) == 1:
        domain_string = domains
    else:
        domain_string = '; '.join(domains)
    mylist.append(domain_string)

It (1) loops through all the emails, (2) appends them first to orig, (3) finds the domains, then (4) concatenates them and appends them to mylist.

Upvotes: 3

Lam Kwok Shing
Lam Kwok Shing

Reputation: 71

The for loop of your code need to be refactor like so:

  1. append the current item to the original list
  2. split all emails by semi-colon ';'
  3. trim white space for each email
  4. find the '@' sign and extract the substring of the domain
  5. join all domains with ';'
  6. append the result to mylist

Hope this seudocode can help.

Upvotes: 0

Related Questions