RvdV79
RvdV79

Reputation: 2022

Trying to find a match within multiple lines with help of a regex

This might be a simple question for quite a lot of people, but it still is a puzzle to me (perhaps as I am a complete n00b on regular expressions).

I am working to find a regular expression that might help find mistakes in huge files of logging information.

Basically, I need to find a number, always starting with a Z, followed by exactly 11 digits. Consider Z00000012345 as an example.

This number is used in multiple sentences, example given:

Line 144: 07:16:36:933 | Important event received: number arrived: Z00000012345
Line 162: 07:16:42:314 | Processing and doing extremely important stuff...
Line 164: 07:16:42:374 | Almost ready with processing number Z00000012345
Line 165: 07:16:42:374 | Success with processing; number 'Z00000012345' has been processed.

What do I need to find:
It sometimes occurs that the number that has been processed (the number between single quotes), differs from the number that has arrived at the system (first line).

The other tricky thing is that there is not an exact amount of fixed sentences in between.

I would like to have it setup in groups, as that comparison might then be the most easy one, so I started with:

(?<Found>(\barrived:\s)(\w+))

My goal was to capture the word just behind 'arrived:' first and then find the next group that matches the same word but then between single quotes (as seen in the last line).

However, how can I do that easily? Ultimately, I would like to bring this into a C# tool.

By the way, the files run up to 8 gigabytes in size, hence the way I am looking for speed.

Desired output:
The desired output is a flag whenever there is no match between the first number (see line 144 in the example) and the final number on line 165. If these are wrong, I have a mismatch. As this is very very rare, I thought that it would be best to search it that way.

Upvotes: 2

Views: 119

Answers (2)

Oliver Hao
Oliver Hao

Reputation: 735

You can try this:

arrived:\s*(Z\d{11})((?!arrived)[\s\S])*'((?:(?!\1)[^'])+)'

This is demo: https://regex101.com/r/RAI4Zh/1

Upvotes: 2

Mirza Ghulam Rasyid
Mirza Ghulam Rasyid

Reputation: 184

enter image description here

Just use this pattern and make sure you use the RegexOptions.Compiled for speed and RegexOptions.MultiLine for capturing multiline.

using System.Text.RegularExpressions;

string logFileContent = "Line 144: 07:16:36:933 | Important event received: number arrived: Z00000012345\r\nLine 162: 07:16:42:314 | Processing and doing extremely important stuff...\r\nLine 164: 07:16:42:374 | Almost ready with processing number Z00000012345\r\nLine 165: 07:16:42:374 | Success with processing; number 'Z00000012345' has been processed.\r\n";

string pattern = @"(?<WholeMatch>\'?(?<Number>Z\d{11})\'?)";
MatchCollection matches = Regex.Matches(logFileContent, pattern, RegexOptions.Compiled | RegexOptions.Multiline);
foreach(Match match in matches)
{
    Console.WriteLine(match.Value);
}

Of course you can modify the pattern above for speed or just use as simple as this

string pattern = @"\'?(Z\d{11})\'?";

Upvotes: 1

Related Questions