Reputation: 11489
I'm trying to parse some source files for some standard information.
The source files could look like this:
// Name: BoltBait
// Title: Some cool thing
or
// Name :
// Title : Another thing
or
// Title:
// Name:
etc.
The code I'm using to parse for the information looks like this:
Regex REName = new Regex(@"\/{2}\s*Name\s*:\s*(?<nlabel>.*)\n", RegexOptions.IgnoreCase);
Match mname = REName.Match(ScriptText); // entire source code file
if (mname.Success)
{
Name.Text = mname.Groups["nlabel"].Value.Trim();
}
Which works fine if the field has information. It doesn't work if the field is left blank.
For example, in the third example above, the Title
field returns a match of "// Name:" and I want it to return the empty string.
I need help from a regex expert.
I thought the regex was too greedy, so I tried the following expression:
@"\/{2}\s*Name\s*:\s*(?<nlabel>.*?)\n"
However, it didn't help.
Upvotes: 2
Views: 172
Reputation: 1541
My approach is to use an alternate in a non-capturing group to match the label from the colon to the end of the line. This matches either anything to the end of the line, or nothing.
var text1 = "// Name: BoltBait" + Environment.NewLine + "// Title: Some cool thing" + Environment.NewLine;
var text2 = "// Name :" + Environment.NewLine + "// Title : Another thing" + Environment.NewLine;
var text3 = "// Title:" + Environment.NewLine + "// Name:" + Environment.NewLine;
var texts = new List<string>() { text1, text2, text3 };
var options = RegexOptions.IgnoreCase | RegexOptions.Multiline;
var regex = new Regex("^//\\s*?Name\\s*?:(?<nlabel>(?:.*$|$))", options );
foreach (var text in texts){
var match = regex.Match( text );
Console.WriteLine( "|" + match.Groups["nlabel"].Value.Trim() + "|" );
}
Produces:
|BoltBait|
||
||
Upvotes: 0
Reputation: 627469
You can also use a class subtraction to avoid matching newline symbols:
//[\s-[\r\n]]*Name[\s-[\r\n]]*:[\s-[\r\n]]*(?<nlabel>.*)(?=\r?\n|$)
Note that:
[\s-[\r\n]]*
- Matches any whitespace excluding newline symbols (a character class subtraction is used)(?=\r?\n|$)
- A positive look-ahead that checks if there is a line break or the end of the string.See regex demo, output:
Upvotes: 1
Reputation: 6852
\s
includes line breaks, which is not wanted here.
It should suffice to match tabs and spaces explicitly after :
\/{2}\s*Name\s*:[\t ]*(?<nlabel>.*?)\n
This returns the empty string correctly in your third example (for both name and title).
Upvotes: 1