Reputation: 270
I have a list of names and their dob
1.uzamaki/narutomr 20mar
2.hyuga/hinata mrs 13apr
3.haruno/sakuramiss 25nov
4.uchiha/sasuke mstr
5.uchiha/itachi akatsuki mr 12feb
6.lee/rock 23jun
7.hatake/kakashi mr 30oct 8.sarutobi/hiruzen mr 31dec
And I need to extract the serial number, firstname, surname, title and dob;
for example in case number 5
number -> 5
surname -> uchiha
firstname -> itachi akatsuki
title -> mr
dob -> 12feb
The regex I came up with
/(?<number>\d+)\.(?<surname>[a-z\s]*)\/(?<firstname>[a-z\s]*)(?<title>mrs|mr|miss|mstr)?\s(?<dob>\d{2}[a-z]{3})/giU
This works fine in Ungreedy mode, but lines which does not have the dob it fails. If I try to make the dob optional by adding a '?', none of the pattern matches completely.
So, is it possible to have the firstname group not ending with a title? Can $ be used just in the scope of the group?
I have cooked up a test here http://regex101.com/r/gR7tX2/4
Note: Title and dob are optional groups. Also there may or may not be a space between the firstname and the title. Thus valid firstnames ending with a title are special cases and are out of scope of this quest.
Upvotes: 1
Views: 84
Reputation: 784998
You can use this regex:
(?<number>\d+)\.(?<surname>[a-z\s]+)/(?<firstname>[a-z\s]+)\s*(?<title>mrs?|miss|mstr)?(?:\s(?<dob>\d{2}[a-z]{3}))?$
Update: Based on your edits you can use this regex:
(?<number>\d+)\.(?<surname>[a-z\s]+)/(?<firstname>[a-z\s]+)\s*(?<title>mrs?|miss|mstr)?(?:\s(?<dob>\d{2}[a-z]{3}))? *(?=\d+\.|$)
PS: Used flags are miU
(multiline, ignore case, ungreedy)
Upvotes: 3
Reputation: 89547
You can use this pattern but without the ugly U modifier:
~(?<number>[0-9]+) \.
(?<surname>[a-z\s]+) /
(?<firstname>[a-z\s]+?)
(?: \s+ (?<title>m(?:rs?|iss|str)) )?
(?: \s+ (?<dob>[0-9]{2}[a-z]{3}) )?
(?=\s[0-9]+\.|$)
~x
The only useful non-greedy quantifier is in the firstname group, the goal is to trim the trailing spaces without "eating" the title. Since the two next groups are optional, you need to add a lookahead at the end to pump the non-greedy quantifier until the end of the item or the title or dob part.
Upvotes: 1