Frank Claydon
Frank Claydon

Reputation: 31

Matching inverse of multi line regex pattern

I have the following regex in notepad++, which will find a line starting with "BGADD" and, if applicable, all subsequent lines that start with "+". This match works and I can use replace (with a zero length string) to remove these blocks of text from my document.

Regex:

^BGADD.*$(\R|\z)(^[+].*$(\R|\z))*

What I would like to do however is match and remove any blocks of text that are the opposite of this, such that I am left with only what matches the pattern.

I have tried combinations of positive/negative lookarounds but am failing to come up with something that works- possible becase the lookaround must be fixed length? Thanks in advance for any help. I have tried numerous searches and have attempted to implement various things that I have read on similar threads, but have not got there yet.

The data I am working with:

BGADD       1000100010011000
+       30001002300010035000
+          91016   91017   9
+          91024   91025   9
BGSET   10001002100001071000
+           1011    1012    
+           1019    1020    
BGADD       1000100010011000
BGADD       1000100010011000
+       30001002300010035000
+       19001006290010013900
BGSET   20001001200001012000
+           1011    1012    
SOMETHINGELSE  3000100230000
BGADD       1000100010011000
+       30001002300010035000
+       19001006290010013900
BGSET   30001003300001033000
BGSET   50001001500001035000
BGADD       1000100010011000
+       30001002300010035000
+       19001006290010013900
DIFFERENTTHING   19001001190
+           1011    1012    
+           1019    1020    
BGSET   19001002190001071900
BGADD       1000100010011000

What it looks like if I replace with blank string on the regex:

BGSET   10001002100001071000
+           1011    1012    
+           1019    1020    
BGSET   20001001200001012000
+           1011    1012    
SOMETHINGELSE  3000100230000
BGSET   30001003300001033000
BGSET   50001001500001035000
DIFFERENTTHING   19001001190
+           1011    1012    
+           1019    1020    
BGSET   19001002190001071900

What I am aiming for (i.e. the inverse):

BGADD       1000100010011000
+       30001002300010035000
+          91016   91017   9
+          91024   91025   9
BGADD       1000100010011000
BGADD       1000100010011000
+       30001002300010035000
+       19001006290010013900
BGADD       1000100010011000
+       30001002300010035000
+       19001006290010013900
BGADD       1000100010011000
+       30001002300010035000
+       19001006290010013900
BGADD       1000100010011000

Upvotes: 1

Views: 210

Answers (1)

Frank Claydon
Frank Claydon

Reputation: 31

Turns out I think I have answered my own question- thanks to the suggestion from @Mako212 which provided the first part I hadn't thought of yet. The regex I have constructed finds lines that don't start with "BGADD" or "+" and then uses the same basis as the first regex in my question to match until it finds the next instance of "BGADD"

What I think is the answer:

^(?!BGADD|[+]).*$(\R|\z)(.*$(\R|\z))*?(?=BGADD)

>>>Edited to simplify and capture last line of document:

^(?!BGADD|[+])(.*$(\R|\z))+?(?=BGADD|\z)

Output:

BGADD       1000100010011000
+       30001002300010035000
+          91016   91017   9
+          91024   91025   9
BGADD       1000100010011000
BGADD       1000100010011000
+       30001002300010035000
+       19001006290010013900       
BGADD       1000100010011000
+       30001002300010035000
+       19001006290010013900
BGADD       1000100010011000
+       30001002300010035000
+       19001006290010013900
BGADD       1000100010011000

Upvotes: 2

Related Questions