Reputation: 1
I'm writing a program to find who a book was printed for. I am given an imprint line and I have to extract the names. Note that each imprint line does not contain X amount of people, meaning the book can be written for one or multiple people.
Here is an example of an imprint line:
"[[London] : Finished in Ianuarie 1587, and the 29 of the Queenes Maiesties reigne, with the full continuation of the former yeares, for Iohn Harison, George Bishop, Rafe Newberie, Henrie Denham, and Thomas VVoodcocke. At London printed [by Henry Denham] in Aldersgate street at the signe of the Starre,"
I have a regex that will match "Iohn Harison, George Bishop, Rafe Newberie, Henrie Denham, and Thomas Woodcock. At London" in the above line.
The problem is: The way the regex is coded it will match the next sentence because it will start with a capital, which will be matched by the name regex. Also I cannot just search for a period because there can be a list of initials: J.D., K.G., & V.X.
The string name will basically match any format a name can be in.
name will match: ( John | John Day | John Wayne Day| John-Day | J.D. | John | J. | J.D | .J.D. | mcJohn Day) and each name must contain a capital letter, and a name can be composed of multiple names.
Here is the current code:
string line = imprint_line;
string name("(\\s[a-z]*[A-Z\\.]+[a-z\\.:-]*)+");
regex reg("[Ff]or"+name+"((,|,?\\sand|\\s&)?"+name+")*");
smatch matches;
if (regex_search(line, matches, reg))
printedFor = matches[0];
I want to change reg to lookahead for , or and or & or , and
I was trying something like this:
regex reg("[Ff]or"+name+"(?=(,|,?\\sand|,?\\s&))"+name+")*");
but this return a regex error. Is there someway I can do this?
Thanks in advance for all the help.
Upvotes: 0
Views: 299
Reputation:
This is your current regex cleaned up a bit.
I can't figure out why you need the lookahead though.
Can you explain better?
[Ff] or
(?: \s [a-z]* [A-Z.]+ [a-z.:-]* )+
(?:
(?: , | ,? \s and | \s & )?
(?: \s [a-z]* [A-Z.]+ [a-z.:-]* )+
)*
Here is the error you are getting
[Ff] or
(?:
\s [a-z]* [A-Z.]+ [a-z.:-]*
)+
(?= , | ,? \s and | ,? \s & )
(?:
\s [a-z]* [A-Z.]+ [a-z.:-]*
)+
= ) <-- Unbalanced ')'
*
Upvotes: 1