Reputation: 5524
Working in .net i'm parsing a log file where some lines do not begin with '"2018'. I need a .Match clause that will find lines where the line begins with anything except the string "2018 (note that includes the double quote). When found (and this is the tricky bit) - remove the line break from the line before the offending line. In other words, append offending lines to the line above it.
"2018-02-22 10:06:10,857","[7]"," ERROR","MyApp.Web.Infrastructure.ErrorResponseCommand","ErrorResponseCMD logs Controller: webinar | Action: Index",""
"2018-02-22 10:06:37,742","[11]"," INFO ","MyApp.Web.MvcApplication","Anon Session Starts with: {""FirstPage"": ""https://www.bankwebinars.com/wp-login.php"", ""QueryString"": """", ""SessionId"": ""uhnev2dnds33dastwrdgftvm"", ""FirstCookies"": {""CookieName"": ""ASP.NET_SessionId"", ""Value"": ""uhnev2dnds33dastwrdgftvm""}}",""
"2018-02-22 10:06:48,053","[11]"," INFO ","MyApp.Web.Controllers.CartController","SessionInfo{
""FirstPage"": null,
""RemoteAddress"": ""207.46.13.159"",
""RemoteHost"": ""207.46.13.159"",
""RemoteUser"": """",
RelativeConfirmPasswordResetUrl:Account/PasswordResetConfirm
//and other non-predictable BOL patterns.
},""
"2018-02-22 10:06:10,857","[7]"," ERROR","MyApp.Web.Infrastructure.ErrorResponseCommand","ErrorResponseCMD logs Controller: webinar | Action: Index",""
ADDENDUM: Having tried the suggested pattern - and noting that pattern works correctly for regex101's sandbox - there must be something else wrong. Here's my current code.
string str = File.ReadAllText("myLog.log");
Regex rx = new Regex("(?m)\r?\n^(?!\"2018)", RegexOptions.Singleline);
str = rx.Replace(str, "\"2018");
File.WriteAllText("test1.txt", str);
I've tried a bunch of variations on the pattern - e.g. I think the RegexOption clause is equivalent to the (?m) phrase so I've tried omitting that. Singleline should be what i want since it views the whole file as a single line but I've tried Multiline mode as well. It's a Windows file so the ? qualifier between \r and \n should not be required. None of the variations have changed the output.
Upvotes: 3
Views: 670
Reputation: 48741
1- I saw documentation page of File.ReadAllText()
emphasizes:
The resulting string does not contain the terminating carriage return and/or line feed.
If that's the problem take a look at this thread, I'm not a .NET guru.
2- And you need to @-quote regex string beside caring about inner double quotation mark (""
denotes a "
in @-quoted string) and removing s
flag as well which is extra.
Regex rx = new Regex(@"(?m)\r?\n^(?!""2018)");
3- Next thing is replacement string that you provided. You should replace with nothing. A Zero-Width Negative Lookahead Assertion asserts and doesn't consume:
str = rx.Replace(str, "");
Upvotes: 2
Reputation: 31312
Here is regex replace that does the job:
str = Regex.Replace(str, @"\r?\n(?!""2018)", String.Empty);
The following code from the question is incorrect:
Regex rx = new Regex("(?m)\r?\n^(?!\"2018)", RegexOptions.Singleline);
str = rx.Replace(str, "\"2018");
(?!\"2018)
is a negative lookahead. Like other lookarounds it does not actually capture matched text. That's why rx.Replace(str, "\"2018")
will cause adding of "2018
to each moved string. For example for input:
"2018" Line 1
"2018" Line 2
Sub-line 1
Sub-line 2
"2018" Line 3
you'll get the following result:
"2018" Line 1
"2018" Line 2"2018 Sub-line 1"2018 Sub-line 2
"2018" Line 3"2018
That's why you should replace matched parts with just an empty string. In this case you will get correct result:
"2018" Line 1
"2018" Line 2 Sub-line 1 Sub-line 2
"2018" Line 3
Upvotes: 2
Reputation: 2867
I was able to get what I think is the desired result by doing the following:
Regex.Replace(logString, @"\r\n\s\s", "", RegexOptions.Multiline)
Upvotes: 0