Reputation: 1144
I found a citation parsing regular expression here: http://linklens.blogspot.com.au/2009/04/citation-parsing-regular-expression.html and it's working fine when testing it at http://www.regexr.com, however it's not working when attempting to use Regex.Match
in c#.
This is the expression (with escaped \"") - evaluated from c# and re-tested in regexr.
/([^e][^d][^s][^\.]\s|\d+\.?\s|^)([A-Z][a-z]{1,},?((\s[A-Z](\.|,|\.,))(\s?[A-Z](\.|,|\.,))*))(\s?(,|and|&|,\s?and)?\s?([A-Z][a-z]{1,},?((\s[A-Z](\.|,|\.,))(\s?[A-Z](\.|,|\.,))*)))*\s*(\(?\d\d\d\d\)?\.?)?\s*(\""|“)?((([A-Za-z:,\r\n]{2,}\s?){3,}))\.?(\""|”)?/g
Would anybody familiar with regular expressions notice anything that may not be compatible with c# in this fairly complex expression?
Edit:
Link to regexr example with some text citations: http://regexr.com/3a232
var myMatches = @"/([^e][^d][^s][^\.]\s|\d+\.?\s|^)([A-Z][a-z]{1,},?((\s[A-Z](\.|,|\.,))(\s?[A-Z](\.|,|\.,))*))(\s?(,|and|&|,\s?and)?\s?([A-Z][a-z]{1,},?((\s[A-Z](\.|,|\.,))(\s?[A-Z](\.|,|\.,))*)))*\s*(\(?\d\d\d\d\)?\.?)?\s*(""|“)?((([A-Za-z:,\r\n]{2,}\s?){3,}))\.?(""|”)?/g";
var matches = Regex.Matches(TestApp.Properties.Resources.Citation, myMatches);
Console.WriteLine(matches.Count);
Returns 0 matches.
Upvotes: 0
Views: 358
Reputation: 700152
You are escaping the quotation marks wrong. It's never escaped with \""
.
In a regular string a quotation mark is escaped with \"
.
In a @ delimited string a quotation mark is escaped with ""
.
You should remove the /
from the beginning of the string and the /g
from the end of the string. They are not part of the pattern, that is the syntax for a regex literal (which doesn't exist in C# syntax by the way).
Upvotes: 2