Reputation: 5234

Python: Regex Function Parsing Through Email String and Returning Tuple Or Returning ValueError If Input Invalid

I want to write a function that parses through an email input and returns a tuple with (id, domain). id is the user name while domain is the domain name. Email separated by @ character

For example: [email protected] would parse to ('kyle', 'asu.edu'). But below are some additional constraints on the function:

username begins with alphabetic character
domain name ends with alphabetic character
special characters such as ., -, _, or + are allowed
no whitespace characters permitted including no leading or trailing whitespaces

So if any of the above rules are violated, then the email input is not considered a valid email address and should raise a ValueError.

Below is my attempted code that doesn't quite work:

def email_func(string_input):
    """Parses a string as an email address, returning an (id, domain) pair."""
    ###
    ### YOUR CODE HERE
    regex_parse = re.search(r'([a-zA-Z_+-.]+)@([a-zA-Z.-]+)', string_input) 
    # print (regex_parse)
    
    try:
        return regex_parse.groups()
    
    except ValueError:
        raise ValueError ('not a valid email address')
    ###

For a simple example it works.

email_func('[email protected]') returns `('kyle', 'asu.edu')` which is correct.

Instances where my code doesn't work:

For invalid input strings with white spaces I'm not raising a ValueError. For example: email_func('kyle @asu.edu') outputs an error:

---> 11 return regex_parse.groups()
AttributeError: 'NoneType' object has no attribute 'groups'
I'm not getting a ValueError for leading white spaces: For example: email_func(' [email protected]') outputs ('kyle', 'asu.edu') Same issue with trailing white spaces.
How do I specify in my regex that the email can't start or end with a number / has to be alphabetic character?

Upvotes: 2

Answers (3)

nahar

Reputation: 41

Below should return a tuple (username, domain). It will raise the except block if any of your constraints are violated. Modify as needed.

def email_func(string_input):

    username_template = r'^[a-z][a-z\d_\.+=-]{0,30}@'
    domain_template = r'@\w*\.[a-z]+'

    try:
        username = re.search(username_template, string_input)
        domain = re.search(domain_template, string_input)

        email_tuple = (username.group(0).strip('@'), domain.group(0).strip('@'))

        return email_tuple

    except AttributeError:
        print('please enter a valid email.')

To catch the username, the regex below checks to make sure it starts with a latin letter ^[a-z], then for the rest of the username. If ^[a-z] is violated, except block will trigger.

username_template = r'^[a-z][a-z\d_\.=-]{0,30}@'

To catch domain, the regex below checks to make sure everything proceeding the . is a latin letter and not a digit: .[a-z]+

domain_template = r'@\w*\.[a-z]+'

Upvotes: 0

Barmar

Reputation: 781721

As you can clearly see, calling regex_parse.groups() raises AttributeError, not ValueError, when regex_parse is None, which is what is returned when re.search() can't find a match. So change except ValueError: to except AttributeError:. Or you could simply use

if regex_parse is None: 
    raise ValueError("Not a valid email address")

You should anchor your regexp so it has to match the entire string, not search for a match anywhere in the string. r'^([a-zA-Z_+-.]+)@([a-zA-Z.-]+)$'. ^ matches the beginning, $ matches the end.
Start and end the regexp with [a-zA-Z].
r'([a-zA-Z][a-zA-Z_+-.]*)@([a-zA-Z.-]*[a-zA-Z])'

Upvotes: 1

Yulian

Reputation: 365

I assume that your input is only one email address and you need to validate it. So there is no need to use search. What you are really looking for is the match function.

With small changes to your code, it looks like this:

def email_func(string_input):
    """Parses a string as an email address, returning an (id, domain) pair."""
    ###
    ### YOUR CODE HERE
    regex_parse = re.match(r'([a-zA-Z_+-.]+)@([a-zA-Z.-]+)', string_input) 
    # print (regex_parse)
    
    if regex_parse:
        return regex_parse.groups()
    
    else:
        raise ValueError ('not a valid email address')
    ###

Upvotes: 1

Python: Regex Function Parsing Through Email String and Returning Tuple Or Returning ValueError If Input Invalid

Answers (3)

Related Questions