Reputation: 133
I am trying to extract multiple domain names from the following data frame:
email
0 [email protected]; [email protected]
1 [email protected]; [email protected]
2 [email protected]
I can split and extract the first email address using the following code:
orig = []
mylist = []
for i in df['email']:
orig.append(i)
i = i[ i.find("@") : ]
i = i.split(";")
i = ';'.join(i)
mylist.append(i)
After appending the lists to a data frame I get the following result:
origemail newemail
0 [email protected]; [email protected] @gmail1.com; [email protected]
1 [email protected]; [email protected] @gmail3.com; [email protected]
2 [email protected] @gmail5.com
The result I am after: (these email addresses may not be limited to two, it could be more.)
origemail newemail
0 [email protected]; [email protected] @gmail1.com; @gmail2.com
1 [email protected]; [email protected] @gmail3.com; @gmail4.com
2 [email protected] @gmail5.com
Can someone please point me in the right direction to achieve the desired output? Thanks in advance.
Upvotes: 0
Views: 658
Reputation: 192
Something like this should work:
orig = []
mylist = []
for i in df['email']:
orig.append(i)
emails = i.strip().split(';')
domains = [x[x.find('@'):] for x in emails]
if len(domains) == 1:
domain_string = domains
else:
domain_string = '; '.join(domains)
mylist.append(domain_string)
It (1) loops through all the emails, (2) appends them first to orig
, (3) finds the domains, then (4) concatenates them and appends them to mylist
.
Upvotes: 3
Reputation: 71
The for loop of your code need to be refactor like so:
Hope this seudocode can help.
Upvotes: 0