Reputation: 6848
With a given string: Surname,MM,Forename,JTA19 R <[email protected]>
I can match all the groups with this:
([A-Za-z]+),([A-Z]+),([A-Za-z]+),([A-Z0-9]+)\s([A-Z])\s<([A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4})
However, when I apply it to Python it always fails to find it
regex=re.compile(r"(?P<lastname>[A-Za-z]+),"
r"(?P<initials>[A-Z]+)"
r",(?P<firstname>[A-Za-z]+),"
r"(?P<ouc1>[A-Z0-9]+)\s"
r"(?P<ouc2>[A-Z])\s<"
r"(?P<email>[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4})"
)
I think I've narrowed it down to this part of email:
[A-Z0-9._%+-]
What is wrong?
Upvotes: 0
Views: 76
Reputation: 174624
You are passing multiple strings to the compile method, you need to pass in one, whole, regular expression.
exp = '''
(?P<lastname>[A-Za-z]+),
(?P<initials>[A-Z]+),
(?P<firstname>[A-Za-Z]+),
(?P<ouc1>[A-Z0-9]+)\s
(?P<ouc2>[A-Z])\s<
(?P<email>[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4})'''
regex = re.compile(exp, re.VERBOSE)
Although I have to say, your string is just comma separated, so this might be a bit easier:
>>> s = "Surname,MM,Forename,JTA19 R <[email protected]>"
>>> lastname,initials,firstname,rest = s.split(',')
>>> ouc1,ouc2,email = rest.split(' ')
>>> lastname,initials,firstname,ouc1,ouc2,email[1:-1]
('Surname', 'MM', 'Forename', 'JTA19', 'R', '[email protected]')
Upvotes: 1
Reputation: 6326
Replace
r"(?P<email>[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4})"
with
r"(?P<email>[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4})"
to allow for lowercase letters too.
Upvotes: 1