Alex Hedley
Alex Hedley

Reputation: 786

Regex - Get all text after pattern from log C#

I have the following log file that I'd like to parse in C#.

I've gone down the route of using a RegEx to get most of it split. I've tested this in RegExr with MultiLine (m) flag checked.

Log

5376:0084 2015-08-07 13:51:29.103 Error ### Error Message ###
5376:0084 2015-08-07 13:51:35.545 Error Discarding invalid session
    System.Web.Services.Protocols.SoapException: Verify Session ID failed
    at System.Web.Services.Protocols.SoapHttpClientProtocol.ReadResponse(SoapClientMessage message, WebResponse response, Stream responseStream, Boolean asyncCall)
    at System.Web.Services.Protocols.SoapHttpClientProtocol.Invoke(String methodName, Object[] parameters)
5376:0084 2015-08-07 13:51:36.013 Error ### Error Message ###

Split to Table:

| ProcessID | DateTime                | Type  | Message               |
|-----------|-------------------------|-------|-----------------------|
| 5376:0084 | 2015-08-07 13:51:29.103 | Error | ### Error Message ### |

I've used the following pattern

string pattern = @"(.*:\d{4}) ((\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2}).(\d{3})) ([A-Za-z\n]+) (.*$)";

This gets lines 1,3 & 6 but I'd like to gather lines 2-5 into one group. So "Discarding ... parameters)" would be the whole Message.

Upvotes: 1

Views: 772

Answers (4)

NeverHopeless
NeverHopeless

Reputation: 11233

Besides regex, for parsing log file, you have a possibility to use TextFieldParser class. Although it is an unnecessary dependance on this assebmly Microsoft.VisualBasic.FileIO.TextFieldParser, but it is a good one.

Here is a good tutorial on how to consume this class.

Upvotes: 0

Mi Ke Bu
Mi Ke Bu

Reputation: 224

Alex. Try this:

string pattern = @"^(\S+) (\S+ \S+) (\S+) ((?:.*(?:\n\s)?)+)";

(Sample is here: https://regex101.com/r/uI4uQ0/1)

  1. (\S+) = "5376:0084";
  2. (\S+ \S+) = "2015-08-07 13:51:29.103"
  3. (\S+) = "Error" (one word only)
  4. ((?:.*(?:\n\s)?)+) = "...session
        System...
        at..."

Magick is here: "\n\s". It says that we need line-break AND any whitespace char.

Good luck, MiKe.

Upvotes: 0

Avinash Raj
Avinash Raj

Reputation: 174696

You need to match also the newline characters which exists at the Message part. This could be achieved by using DOTALL modifier s.

@"(?s)(\d+:\d{4}) (\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3}) ([A-Za-z\n]+) (.*?)(?=\n\d+:|$)"

or

@"(?s)(?:\n|^)(\d+:\d{4}) (\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3}) ([A-Za-z\n]+) (.*?)(?=\n\d+:|$)"

DEMO

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626690

Note that in log parsing named captures are a great help, I strongly advise to use them. Also, you can have more control over what you capture with . using an inline singleline modifier (?s:...). This way, you do not have to use a global RegexOptions.Singleline option, and you still can use . to match any symbol but a newline.

Here is my attempt:

var pattern = @"(?m)^(?<ProcessID>\d{4}:\d{4})\s+(?<DTime>\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\.\d+)\s+(?<Type>\w+)\s+(?<Message>(?s:.*?(?=\n\d+:\d+|\r?\z)))";

Here, (?m) sets the multiline mode for ^ to match a line start, then I modified the ID and datetime subpatterns for more efficient ones with \d{n}, the Type part can be actually adjusted to your needs (e.g. [\w\s]+), and the Message part will only match arbitrary number of characters up to the XXX:XXXX on a new line (due to \n\d+:\d+) or up to the end of the string (\z). See regex demo, see Table tab.

enter image description here

Upvotes: 1

Related Questions