botchedDevil
botchedDevil

Reputation: 270

Regular Expression Group not ending with a string

I have a list of names and their dob

1.uzamaki/narutomr 20mar
2.hyuga/hinata mrs 13apr
3.haruno/sakuramiss 25nov
4.uchiha/sasuke mstr
5.uchiha/itachi akatsuki mr 12feb
6.lee/rock 23jun
7.hatake/kakashi mr 30oct 8.sarutobi/hiruzen mr 31dec

And I need to extract the serial number, firstname, surname, title and dob;

for example in case number 5

number      -> 5
surname     -> uchiha
firstname   -> itachi akatsuki
title       -> mr
dob         -> 12feb

The regex I came up with

/(?<number>\d+)\.(?<surname>[a-z\s]*)\/(?<firstname>[a-z\s]*)(?<title>mrs|mr|miss|mstr)?\s(?<dob>\d{2}[a-z]{3})/giU

This works fine in Ungreedy mode, but lines which does not have the dob it fails. If I try to make the dob optional by adding a '?', none of the pattern matches completely.

So, is it possible to have the firstname group not ending with a title? Can $ be used just in the scope of the group?

I have cooked up a test here http://regex101.com/r/gR7tX2/4

Note: Title and dob are optional groups. Also there may or may not be a space between the firstname and the title. Thus valid firstnames ending with a title are special cases and are out of scope of this quest.

Upvotes: 1

Views: 84

Answers (2)

anubhava
anubhava

Reputation: 784998

You can use this regex:

(?<number>\d+)\.(?<surname>[a-z\s]+)/(?<firstname>[a-z\s]+)\s*(?<title>mrs?|miss|mstr)?(?:\s(?<dob>\d{2}[a-z]{3}))?$

RegEx Demo


Update: Based on your edits you can use this regex:

(?<number>\d+)\.(?<surname>[a-z\s]+)/(?<firstname>[a-z\s]+)\s*(?<title>mrs?|miss|mstr)?(?:\s(?<dob>\d{2}[a-z]{3}))? *(?=\d+\.|$)

RegEx Demo2

PS: Used flags are miU (multiline, ignore case, ungreedy)

Upvotes: 3

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89547

You can use this pattern but without the ugly U modifier:

~(?<number>[0-9]+) \.
 (?<surname>[a-z\s]+) / 
 (?<firstname>[a-z\s]+?) 
 (?: \s+ (?<title>m(?:rs?|iss|str)) )?
 (?: \s+ (?<dob>[0-9]{2}[a-z]{3}) )?
 (?=\s[0-9]+\.|$)
~x

demo

The only useful non-greedy quantifier is in the firstname group, the goal is to trim the trailing spaces without "eating" the title. Since the two next groups are optional, you need to add a lookahead at the end to pump the non-greedy quantifier until the end of the item or the title or dob part.

Upvotes: 1

Related Questions