Reputation: 2024
I have a text file with the repetitve structure as a header and a detail records such as
StopService::
697::12::test::20::[email protected]::20 Main Rd::Alcatraz::CA::1200::Please send me Information to
[email protected]::0::::
I want to remove the line break between the header and the detail record so as to process them as a single record, as the detail record can contain line breaks as well I need to remove only the line breaks that follow directly the ::
sign.
I'm not a pro when using regular expressions so I searched and tried to use this approach but it doesn't work:
string text = File.ReadAllText(path);
Regex.Replace(text, @"(?<=(:))(?!\1):\n", String.Empty);
File.WriteAllText(path, text);
I also tried this:
Regex.Replace(text, @"(?<=::)\n", String.Empty);
Any idea how I can use a regex look-behind in this case? My output should look like this:
StopService::697::12::test::20::[email protected]::20 Main Rd::Alcatraz::CA::1200::Please send me Information to
[email protected]::0::::
Upvotes: 2
Views: 128
Reputation: 1541
Here's my quick attempt. It may need some tweaks, as I just dummied up two records for input.
The approach is to define a Regex that identifies the header, line break, and detail (which may include line breaks). Then, just run a replace that puts the header back together with the detail, throwing out the header/detail line break.
The RegexOptions.IgnorePatternWhitespace option is used to allow whitespace in the expression for better readability.
var text = "StopService::" + Environment.NewLine;
text += "697::12::test::20::[email protected]::20 Main Rd::Alcatraz::CA::1200::Please send me Information to" + Environment.NewLine;
text += "[email protected]::0::::" + Environment.NewLine;
text += "StopService::" + Environment.NewLine;
text += "697::12::test::20::[email protected]::20 Main Rd::Alcatraz::CA::1200::Please send me Information to" + Environment.NewLine;
text += "[email protected]::0::::" + Environment.NewLine;
var options = RegexOptions.Singleline | RegexOptions.IgnorePatternWhitespace;
var matchRegex = new Regex("(?<header>\\w+?::) \\r\\n (?<detail>.+?::::)", options );
var replacement = "${header}${detail}";
var newText = matchRegex.Replace(text,replacement);
Produces:
StopService::697::12::test::20::[email protected]::20 Main Rd::Alcatraz::CA::1200::Please send me Information to
[email protected]::0::::
StopService::697::12::test::20::[email protected]::20 Main Rd::Alcatraz::CA::1200::Please send me Information to
[email protected]::0::::
Upvotes: 1
Reputation: 627469
Read a file line by line. Check the first line and if it is equal to StopService::
do not add a newline (Environment.Newline
) after it.
You can match the line break after the first ::
using a (?<=^[^:]*::)
look-behind:
var str = "StopService::\r\n697::12::test::20::[email protected]::20 Main Rd::Alcatraz::CA::1200::Please send me Information to\r\[email protected]::0::::";
var rgx = new Regex(@"(?<=^[^:]*::)[\r\n]+");
Console.WriteLine(rgx.Replace(str, string.Empty));
Output:
StopService::697::12::test::20::[email protected]::20 Main Rd::Alcatraz::CA::1200::Please send me Information to
[email protected]::0::::
See IDEONE demo
The look-behind ((?<=...)
) matches:
^
- Start of string [^:]*
- 0 or more characters other than :
::
- 2 colonsThe [\r\n]+
pattern makes sure we match all newline symbols, even if there is more than one.
Upvotes: 2
Reputation: 2152
Try this:
Regex.Replace(yourtext, @"(?<=[::])[\r\n|\n|\r]", string.empty);
You were on the right track with the lookbehind idea. But you need to look for a newline and/or/both a carriage return...
Upvotes: 1
Reputation: 145
Javascript:
yourtext.replace(/(\r\n|\n|\r)/gm," ");
I haven't tested C# one. It should work something like below.
C#:
Regex.Replace(yourtext, @"/(\r\n|\n|\r)/gm", " ");
Upvotes: 0