user610217
user610217

Reputation:

Multiline Regular Expression replace

Ok, there's lots of regular expressions, but as always, none of them seem to match what I'm trying to do.

I have a text file:

F00220034277909272011                                  
H001500020003000009272011                              
D001500031034970000400500020000000025000000515000000000
D001500001261770008003200010000000025000000132500000000
H004200020001014209272011                              
D004200005355800007702200005142000013420000000000000000
D004200031137360000779000005000000012000000000000000000
H050100180030263709272011                              
D050100001876700006000300019500000025000000250000001500
D050100001247060000071500030000000025000000280000000000
D050100002075670000430400020000000025000000515000000000
D050100008342500007702600005700000010000000000000000700
D050100009460270000702100015205000025000000000000006205
D050100008135120000702400015000000010000000000000001000
D050100006938430000702200026700000010000000000000001000
D050100006423710008000200025700000000000000000000001000
D050100009488040008000600007175000000000000000000001000
D050100001299190000800100016300000000000000000000003950
D050100001244850000800400005407000000000000000000001607
D050100001216280000840200020000000000000001000000006200
D050100001216840000479000008175000000000000100000001000
D050100001265880000410200014350000000000000100000001000
D050100007402650002000300026700000000000000100000001000
D050100001305150002000200016175000000000001000000000000
D050100005435430000899700022350000000000001000000000000
D050100031113850000500200008200000000250000100000001000

and, with a multiline regex (.NET flavored), I want to do a replace so that I get:

H050100180030263709272011                              
D050100001876700006000300019500000025000000250000001500
D050100001247060000071500030000000025000000280000000000
D050100002075670000430400020000000025000000515000000000
D050100008342500007702600005700000010000000000000000700
D050100009460270000702100015205000025000000000000006205
D050100008135120000702400015000000010000000000000001000
D050100006938430000702200026700000010000000000000001000
D050100006423710008000200025700000000000000000000001000
D050100009488040008000600007175000000000000000000001000
D050100001299190000800100016300000000000000000000003950
D050100001244850000800400005407000000000000000000001607
D050100001216280000840200020000000000000001000000006200
D050100001216840000479000008175000000000000100000001000
D050100001265880000410200014350000000000000100000001000
D050100007402650002000300026700000000000000100000001000
D050100001305150002000200016175000000000001000000000000
D050100005435430000899700022350000000000001000000000000
D050100031113850000500200008200000000250000100000001000

so that, basically, I grab everything that starts with [HD]0501 and nothing else.

I know this seems more suited to a match that a replace, but I'm going through a pre-built engine that accepts a Regex pattern string and a regex replace string only.

What can I supply for a pattern and a replace string to get my desired result? Multiline Regex is a hardcoded configuration?

I originally thought something like this would work:

search: (?<Match>^[HD]0501\d+$), but this matched nothing.

search: (?!^[HD]0501\d+$), but this matched a bunch of empty strings, and I couldn't figure out what to put for the replace string.

search: (?!(?<Omit>^[HD]0501\d+$)), "Group 'Omit' not found."

It seems this should be simple, but as always, Regex manages to make me feel dumb. Help would be greatly appreciated.

Upvotes: 2

Views: 244

Answers (1)

Bart Kiers
Bart Kiers

Reputation: 170148

Try matching the following pattern:

(?m)^(?![HD]0501).+(\r?\n)?

and replace it with an empty string.

The following demo:

using System;
using System.Text.RegularExpressions;

namespace Test
{
  class MainClass
  {  
    public static void Main (string[] args)
    {
      string input = @"F00220034277909272011                                  
H001500020003000009272011                              
D001500031034970000400500020000000025000000515000000000
D001500001261770008003200010000000025000000132500000000
H004200020001014209272011                              
D004200005355800007702200005142000013420000000000000000
D004200031137360000779000005000000012000000000000000000
H050100180030263709272011                              
D050100001876700006000300019500000025000000250000001500
D050100001247060000071500030000000025000000280000000000
D050100002075670000430400020000000025000000515000000000
D050100008342500007702600005700000010000000000000000700
D050100009460270000702100015205000025000000000000006205
D050100008135120000702400015000000010000000000000001000
D050100006938430000702200026700000010000000000000001000
D050100006423710008000200025700000000000000000000001000
D050100009488040008000600007175000000000000000000001000
D050100001299190000800100016300000000000000000000003950
D050100001244850000800400005407000000000000000000001607
D050100001216280000840200020000000000000001000000006200
D050100001216840000479000008175000000000000100000001000
D050100001265880000410200014350000000000000100000001000
D050100007402650002000300026700000000000000100000001000
D050100001305150002000200016175000000000001000000000000
D050100005435430000899700022350000000000001000000000000
D050100031113850000500200008200000000250000100000001000";

      string regex = @"(?m)^(?![HD]0501).+(\r?\n)?";

      Console.WriteLine(Regex.Replace(input, regex, ""));
    }
  }
}

prints:

H050100180030263709272011                              
D050100001876700006000300019500000025000000250000001500
D050100001247060000071500030000000025000000280000000000
D050100002075670000430400020000000025000000515000000000
D050100008342500007702600005700000010000000000000000700
D050100009460270000702100015205000025000000000000006205
D050100008135120000702400015000000010000000000000001000
D050100006938430000702200026700000010000000000000001000
D050100006423710008000200025700000000000000000000001000
D050100009488040008000600007175000000000000000000001000
D050100001299190000800100016300000000000000000000003950
D050100001244850000800400005407000000000000000000001607
D050100001216280000840200020000000000000001000000006200
D050100001216840000479000008175000000000000100000001000
D050100001265880000410200014350000000000000100000001000
D050100007402650002000300026700000000000000100000001000
D050100001305150002000200016175000000000001000000000000
D050100005435430000899700022350000000000001000000000000
D050100031113850000500200008200000000250000100000001000

A quick explanation:

  • (?m)
    • enable multi-line mode so that ^ matches the start of a new line;
  • ^
    • match the start of a new line;
  • (?![HD]0501)
    • look ahead to see if there's no "H0501" or "D0501";
  • .+
    • match one or more chars other than line break-chars;
  • (\r?\n)?
    • match an optional line break.

Upvotes: 3

Related Questions