Extract domain names from multiple email addresses in Data Frame

Question

I am trying to extract multiple domain names from the following data frame:

    email
0   test1@gmail1.com; test1@gmail2.com
1   test3@gmail3.com; test4@gmail4.com
2   test5@gmail5.com

I can split and extract the first email address using the following code:

orig = []
mylist = []
for i in df['email']:
    orig.append(i)
    i = i[ i.find("@") : ]
    i = i.split(";")
    i = ';'.join(i)
    mylist.append(i)

After appending the lists to a data frame I get the following result:

    origemail                           newemail
0   test1@gmail1.com; test1@gmail2.com  @gmail1.com; test1@gmail2.com
1   test3@gmail3.com; test4@gmail4.com  @gmail3.com; test4@gmail4.com
2   test5@gmail5.com  @gmail5.com

The result I am after: (these email addresses may not be limited to two, it could be more.)

    origemail                           newemail
0   test1@gmail1.com; test1@gmail2.com  @gmail1.com; @gmail2.com
1   test3@gmail3.com; test4@gmail4.com  @gmail3.com; @gmail4.com
2   test5@gmail5.com                    @gmail5.com

Can someone please point me in the right direction to achieve the desired output? Thanks in advance.

Lee Garcon · Accepted Answer

Something like this should work:

orig = []
mylist = []
for i in df['email']:
    orig.append(i)
    emails = i.strip().split(';')
    domains = [x[x.find('@'):] for x in emails]
    if len(domains) == 1:
        domain_string = domains
    else:
        domain_string = '; '.join(domains)
    mylist.append(domain_string)

It (1) loops through all the emails, (2) appends them first to orig, (3) finds the domains, then (4) concatenates them and appends them to mylist.

Extract domain names from multiple email addresses in Data Frame

Answers (2)

Related Questions