Reputation: 5234
I want to write a function that parses through an email input and returns a tuple with (id, domain). id is the user name while domain is the domain name. Email separated by @ character
For example: [email protected] would parse to ('kyle', 'asu.edu'). But below are some additional constraints on the function:
So if any of the above rules are violated, then the email input is not considered a valid email address and should raise a ValueError.
Below is my attempted code that doesn't quite work:
def email_func(string_input):
"""Parses a string as an email address, returning an (id, domain) pair."""
###
### YOUR CODE HERE
regex_parse = re.search(r'([a-zA-Z_+-.]+)@([a-zA-Z.-]+)', string_input)
# print (regex_parse)
try:
return regex_parse.groups()
except ValueError:
raise ValueError ('not a valid email address')
###
For a simple example it works.
email_func('[email protected]') returns `('kyle', 'asu.edu')` which is correct.
Instances where my code doesn't work:
For invalid input strings with white spaces I'm not raising a ValueError. For example:
email_func('kyle @asu.edu')
outputs an error:
---> 11 return regex_parse.groups()
AttributeError: 'NoneType' object has no attribute 'groups'
I'm not getting a ValueError for leading white spaces: For example: email_func(' [email protected]')
outputs ('kyle', 'asu.edu')
Same issue with trailing white spaces.
How do I specify in my regex that the email can't start or end with a number / has to be alphabetic character?
Upvotes: 2
Views: 576
Reputation: 41
Below should return a tuple (username, domain). It will raise the except block if any of your constraints are violated. Modify as needed.
def email_func(string_input):
username_template = r'^[a-z][a-z\d_\.+=-]{0,30}@'
domain_template = r'@\w*\.[a-z]+'
try:
username = re.search(username_template, string_input)
domain = re.search(domain_template, string_input)
email_tuple = (username.group(0).strip('@'), domain.group(0).strip('@'))
return email_tuple
except AttributeError:
print('please enter a valid email.')
To catch the username, the regex below checks to make sure it starts with a latin letter ^[a-z], then for the rest of the username. If ^[a-z] is violated, except block will trigger.
username_template = r'^[a-z][a-z\d_\.=-]{0,30}@'
To catch domain, the regex below checks to make sure everything proceeding the . is a latin letter and not a digit: .[a-z]+
domain_template = r'@\w*\.[a-z]+'
Upvotes: 0
Reputation: 781721
regex_parse.groups()
raises AttributeError
, not ValueError
, when regex_parse
is None
, which is what is returned when re.search()
can't find a match. So change except ValueError:
to except AttributeError:
. Or you could simply useif regex_parse is None:
raise ValueError("Not a valid email address")
r'^([a-zA-Z_+-.]+)@([a-zA-Z.-]+)$'
. ^
matches the beginning, $
matches the end.[a-zA-Z]
.r'([a-zA-Z][a-zA-Z_+-.]*)@([a-zA-Z.-]*[a-zA-Z])'
Upvotes: 1
Reputation: 365
I assume that your input is only one email address and you need to validate it. So there is no need to use search
. What you are really looking for is the match
function.
With small changes to your code, it looks like this:
def email_func(string_input):
"""Parses a string as an email address, returning an (id, domain) pair."""
###
### YOUR CODE HERE
regex_parse = re.match(r'([a-zA-Z_+-.]+)@([a-zA-Z.-]+)', string_input)
# print (regex_parse)
if regex_parse:
return regex_parse.groups()
else:
raise ValueError ('not a valid email address')
###
Upvotes: 1