Reputation: 771
I am using python and I want to be able to keep the domain of the email but remove the 'com', or '.co.uk', or 'us', etc
So basically if I have an email, say [email protected]. I want to have only @gmail left in string format, but I want to do this for any email. So [email protected] would leave me with @yahoo, or [email protected], would leave me with @aol
so far I have:
domain = re.search("@[\w.]+", val)
domain = domain.group()
That returns the domain but with the TLD . So @gmail.com, or @aol.co
Upvotes: 3
Views: 1161
Reputation: 1570
For posterity and completeness, this can also be done via index and slice:
email = '[email protected]'
at = email.index('@')
dot = email.index('.', at)
domain = email[at:dot]
Using split()
and re
seems like overkill when the goal is to extract a single sub-string.
Upvotes: 0
Reputation: 863166
With pandas functions use split
:
df = pd.DataFrame({'a':['[email protected]','[email protected]','[email protected]']})
print (df)
a
0 [email protected]
1 [email protected]
2 [email protected]
print ('@' + df.a.str.split('@').str[1].str.split('.', 1).str[0] )
0 @yahoo
1 @aol
2 @aol
Name: a, dtype: object
But faster is use apply
, if in column are not NaN
values:
df = pd.concat([df]*10000).reset_index(drop=True)
print ('@' + df.a.str.split('@').str[1].str.split('.', 1).str[0] )
print (df.a.apply(lambda x: '@' + x.split('@')[1].split('.')[0]))
In [363]: %timeit ('@' + df.a.str.split('@').str[1].str.split('.', 1).str[0] )
10 loops, best of 3: 79.1 ms per loop
In [364]: %timeit (df.a.apply(lambda x: '@' + x.split('@')[1].split('.')[0]))
10 loops, best of 3: 27.7 ms per loop
Another solution with extract
is faster as split
, it can be used if NaN
values in column:
#not sure with all valid characters in email address
print ( '@' + df.a.str.extract(r"\@([A-Za-z0-9_]+)\.", expand=False))
In [365]: %timeit ( '@' + df.a.str.extract(r"\@([A-Za-z0-9 _]+)\.", expand=False))
10 loops, best of 3: 39.7 ms per loop
Upvotes: 1
Reputation: 4493
First split on "@", take the part after "@". Then split on "." and take the first part
email = "[email protected]"
'@' + email.split("@")[1].split(".")[0]
'@gmail'
Upvotes: 2
Reputation: 394
If you do
val = string.split('@')[1].split('.')[0]
Change 'string' for your email string variable name.
This will take everything after the '@' symbol, then everything up to the first '.'
Using on '[email protected]' gives 'gmail'
If you require the '@' symbol you can add it back with;
full = '@' + val
Upvotes: 3