Reputation: 2024

How can I use lookbehind in a C# Regex in order to remove line breaks?

I have a text file with the repetitve structure as a header and a detail records such as

StopService::
697::12::test::20::[email protected]::20 Main Rd::Alcatraz::CA::1200::Please send me Information to
[email protected]::0::::

I want to remove the line break between the header and the detail record so as to process them as a single record, as the detail record can contain line breaks as well I need to remove only the line breaks that follow directly the :: sign.

I'm not a pro when using regular expressions so I searched and tried to use this approach but it doesn't work:

 string text = File.ReadAllText(path);
 Regex.Replace(text, @"(?<=(:))(?!\1):\n", String.Empty);
 File.WriteAllText(path, text);

I also tried this:

Regex.Replace(text, @"(?<=::)\n", String.Empty);

Any idea how I can use a regex look-behind in this case? My output should look like this:

StopService::697::12::test::20::[email protected]::20 Main Rd::Alcatraz::CA::1200::Please send me Information to
    [email protected]::0::::

Upvotes: 2

Answers (4)

Darryl

Reputation: 1541

Here's my quick attempt. It may need some tweaks, as I just dummied up two records for input.

The approach is to define a Regex that identifies the header, line break, and detail (which may include line breaks). Then, just run a replace that puts the header back together with the detail, throwing out the header/detail line break.

The RegexOptions.IgnorePatternWhitespace option is used to allow whitespace in the expression for better readability.

var text = "StopService::" + Environment.NewLine;
text += "697::12::test::20::[email protected]::20 Main Rd::Alcatraz::CA::1200::Please send me Information to" + Environment.NewLine;
text += "[email protected]::0::::"  + Environment.NewLine;
text += "StopService::" + Environment.NewLine;
text += "697::12::test::20::[email protected]::20 Main Rd::Alcatraz::CA::1200::Please send me Information to" + Environment.NewLine;
text += "[email protected]::0::::"  + Environment.NewLine;

var options = RegexOptions.Singleline | RegexOptions.IgnorePatternWhitespace;
var matchRegex = new Regex("(?<header>\\w+?::) \\r\\n (?<detail>.+?::::)", options );
var replacement = "${header}${detail}";

var newText = matchRegex.Replace(text,replacement);

Produces:

StopService::697::12::test::20::[email protected]::20 Main Rd::Alcatraz::CA::1200::Please send me Information to
[email protected]::0::::
StopService::697::12::test::20::[email protected]::20 Main Rd::Alcatraz::CA::1200::Please send me Information to
[email protected]::0::::

Upvotes: 1

Wiktor Stribiżew

Reputation: 627469

Non-regex Way

Read a file line by line. Check the first line and if it is equal to StopService:: do not add a newline (Environment.Newline) after it.

Regex way

You can match the line break after the first :: using a (?<=^[^:]*::) look-behind:

var str = "StopService::\r\n697::12::test::20::[email protected]::20 Main Rd::Alcatraz::CA::1200::Please send me Information to\r\[email protected]::0::::";
var rgx = new Regex(@"(?<=^[^:]*::)[\r\n]+");
Console.WriteLine(rgx.Replace(str, string.Empty));

Output:

StopService::697::12::test::20::[email protected]::20 Main Rd::Alcatraz::CA::1200::Please send me Information to
[email protected]::0::::

See IDEONE demo

The look-behind ((?<=...)) matches:

^ - Start of string
[^:]* - 0 or more characters other than :
:: - 2 colons

The [\r\n]+ pattern makes sure we match all newline symbols, even if there is more than one.

Upvotes: 2

JStevens

Reputation: 2152

Try this:

Regex.Replace(yourtext, @"(?<=[::])[\r\n|\n|\r]", string.empty);

You were on the right track with the lookbehind idea. But you need to look for a newline and/or/both a carriage return...

Upvotes: 1

redsam

Reputation: 145

Javascript:

yourtext.replace(/(\r\n|\n|\r)/gm," ");

I haven't tested C# one. It should work something like below.

C#:

Regex.Replace(yourtext, @"/(\r\n|\n|\r)/gm", " ");

Upvotes: 0

How can I use lookbehind in a C# Regex in order to remove line breaks?

Answers (4)

Non-regex Way

Regex way

Related Questions