Reputation: 786
I have the following log file that I'd like to parse in C#.
I've gone down the route of using a RegEx to get most of it split. I've tested this in RegExr with MultiLine (m) flag checked.
Log
5376:0084 2015-08-07 13:51:29.103 Error ### Error Message ###
5376:0084 2015-08-07 13:51:35.545 Error Discarding invalid session
System.Web.Services.Protocols.SoapException: Verify Session ID failed
at System.Web.Services.Protocols.SoapHttpClientProtocol.ReadResponse(SoapClientMessage message, WebResponse response, Stream responseStream, Boolean asyncCall)
at System.Web.Services.Protocols.SoapHttpClientProtocol.Invoke(String methodName, Object[] parameters)
5376:0084 2015-08-07 13:51:36.013 Error ### Error Message ###
Split to Table:
| ProcessID | DateTime | Type | Message |
|-----------|-------------------------|-------|-----------------------|
| 5376:0084 | 2015-08-07 13:51:29.103 | Error | ### Error Message ### |
I've used the following pattern
string pattern = @"(.*:\d{4}) ((\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2}).(\d{3})) ([A-Za-z\n]+) (.*$)";
This gets lines 1,3 & 6 but I'd like to gather lines 2-5 into one group. So "Discarding ... parameters)" would be the whole Message.
Upvotes: 1
Views: 772
Reputation: 11233
Besides regex, for parsing log file, you have a possibility to use TextFieldParser
class. Although it is an unnecessary dependance on this assebmly Microsoft.VisualBasic.FileIO.TextFieldParser
, but it is a good one.
Here is a good tutorial on how to consume this class.
Upvotes: 0
Reputation: 224
Alex. Try this:
string pattern = @"^(\S+) (\S+ \S+) (\S+) ((?:.*(?:\n\s)?)+)";
(Sample is here: https://regex101.com/r/uI4uQ0/1)
Magick is here: "\n\s". It says that we need line-break AND any whitespace char.
Good luck, MiKe.
Upvotes: 0
Reputation: 174696
You need to match also the newline characters which exists at the Message
part. This could be achieved by using DOTALL modifier s
.
@"(?s)(\d+:\d{4}) (\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3}) ([A-Za-z\n]+) (.*?)(?=\n\d+:|$)"
or
@"(?s)(?:\n|^)(\d+:\d{4}) (\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3}) ([A-Za-z\n]+) (.*?)(?=\n\d+:|$)"
Upvotes: 1
Reputation: 626690
Note that in log parsing named captures are a great help, I strongly advise to use them. Also, you can have more control over what you capture with .
using an inline singleline modifier (?s:...)
. This way, you do not have to use a global RegexOptions.Singleline
option, and you still can use .
to match any symbol but a newline.
Here is my attempt:
var pattern = @"(?m)^(?<ProcessID>\d{4}:\d{4})\s+(?<DTime>\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\.\d+)\s+(?<Type>\w+)\s+(?<Message>(?s:.*?(?=\n\d+:\d+|\r?\z)))";
Here, (?m)
sets the multiline mode for ^
to match a line start, then I modified the ID and datetime subpatterns for more efficient ones with \d{n}
, the Type
part can be actually adjusted to your needs (e.g. [\w\s]+
), and the Message
part will only match arbitrary number of characters up to the XXX:XXXX
on a new line (due to \n\d+:\d+
) or up to the end of the string (\z
).
See regex demo, see Table tab.
Upvotes: 1