Reputation: 307
I'm trying to read a log file and extract some machine/setting information using regular expressions. Here is a sample from the log:
...
COMPUTER INFO:
Computer Name: TESTCMP02
Windows User Name: testUser99
Time Since Last Reboot: 405 Minutes
Processor: (2 processors) Intel(R) Xeon(R) CPU 5160 @ 3.00GHz
OS Version: 5.1 .number 2600:Service Pack 2
Memory: RAM: 48% used, 3069.6 MB total, 1567.3 MB free
ServerTimeOffSet: -146 Seconds
Use Local Time for Log: True
INITIAL SETTINGS:
Command Line: /SKIPUPDATES
Remote Online: True
INI File: c:\demoapp\system\DEMOAPP.INI
DatabaseName: testdb
SQL Server: 10.254.58.1
SQL UserName: SQLUser
ODBC Source: TestODBC
Dynamic ODBC (not defined): True
...
I would like to capture each 'block' of data, using the header as one group, and the data as a second (i.e. "COMPUTER INFO", "Computer Name:.......") and repeat this for each block. The expression if have so far is
(?s)(\p{Lu}{1,} \p{Lu}{1,}:\r\n)(.*\r\n\r\n)
This pulls out the block into the groups like it should, which is great. But I need to have it repeat the capture, which I can't seem to get. I've tried several grouping expressions, including:
(?s)(?:(\p{Lu}{1,} \p{Lu}{1,}:\r\n)(.*\r\n\r\n))*
which would seem to be correct, but I get back lots of NULL result groups with empty group item values. I'm using the .Net RegEx class to apply the expressions, can anyone help me out here?
Upvotes: 7
Views: 5949
Reputation: 43426
Some links regarding repeating groups in regular expressions:
Upvotes: 1
Reputation: 7491
Here is how I would go about it. This would allow you to get the value of a specific group easily but the expression would be a bit more complicated. I add line feeds to make it easier to read. Here is the start:
COMPUTER INFO:.*Computer Name:\s*(?<ComputerName>[\w\s]+).*Windows User Name:\s*(?<WindowUserName>[\w\s]+).*Time Since Last Reboot:\s*(?<TimeSinceLastReboot>[\w\s]+).* (?# This continues on through each of the lines... )
with Comiled, IgnoreCase, SingleLine, and CultureInvariant
Then you would be able to match this via the groups ex:
string computerName = match.Group["ComputerName"].Value;
string windowUserName = match.Group["WindowUserName"].Value;
// etc.
Upvotes: 1
Reputation: 19661
It's not possible to have repeated groups. The group will contain the last match.
You'll need to break this into two problems. First, find each section:
new Regex(@"(?>^[A-Z\s]+:\s*$)\s*(?:(?!^\S).)*", RegexOptions.Singleline | RegexOptions.Multiline);
And then, within each match, use another regex to match each field/value into groups:
new Regex(@"^\s+(?<name>[^:]*):\s*(?<value>.*)$", RegexOptions.Multiline);
The code to use this would look something like this:
Regex sectionRegex = new Regex(@"(?>^[A-Z\s]+:\s*$)\s*(?:(?!^\S).)*", RegexOptions.Singleline | RegexOptions.Multiline);
Regex nameValueRegex = new Regex(@"^\s+(?<name>[^:]*):\s*(?<value>.*)$", RegexOptions.Multiline);
MatchCollection sections = sectionRegex.Matches(logData);
foreach (Match section in sections)
{
MatchCollection nameValues = nameValueRegex.Matches(section.ToString());
foreach (Match nameValue in nameValues)
{
string name = nameValue.Groups["name"].Value;
string value = nameValue.Groups["value"].Value;
// OK, do something here.
}
}
Upvotes: 12
Reputation: 28425
((?<header>[^:]+:)(?<content>[^\r\n]+)?\r\n)+
or, if you have empty lines between items:
(((?<header>[^:]+:)(?<content>[^\r\n]+)?\r\n)|\r\n)+
Upvotes: 1