Reputation: 31
I have the following regex in notepad++, which will find a line starting with "BGADD" and, if applicable, all subsequent lines that start with "+". This match works and I can use replace (with a zero length string) to remove these blocks of text from my document.
Regex:
^BGADD.*$(\R|\z)(^[+].*$(\R|\z))*
What I would like to do however is match and remove any blocks of text that are the opposite of this, such that I am left with only what matches the pattern.
I have tried combinations of positive/negative lookarounds but am failing to come up with something that works- possible becase the lookaround must be fixed length? Thanks in advance for any help. I have tried numerous searches and have attempted to implement various things that I have read on similar threads, but have not got there yet.
The data I am working with:
BGADD 1000100010011000
+ 30001002300010035000
+ 91016 91017 9
+ 91024 91025 9
BGSET 10001002100001071000
+ 1011 1012
+ 1019 1020
BGADD 1000100010011000
BGADD 1000100010011000
+ 30001002300010035000
+ 19001006290010013900
BGSET 20001001200001012000
+ 1011 1012
SOMETHINGELSE 3000100230000
BGADD 1000100010011000
+ 30001002300010035000
+ 19001006290010013900
BGSET 30001003300001033000
BGSET 50001001500001035000
BGADD 1000100010011000
+ 30001002300010035000
+ 19001006290010013900
DIFFERENTTHING 19001001190
+ 1011 1012
+ 1019 1020
BGSET 19001002190001071900
BGADD 1000100010011000
What it looks like if I replace with blank string on the regex:
BGSET 10001002100001071000
+ 1011 1012
+ 1019 1020
BGSET 20001001200001012000
+ 1011 1012
SOMETHINGELSE 3000100230000
BGSET 30001003300001033000
BGSET 50001001500001035000
DIFFERENTTHING 19001001190
+ 1011 1012
+ 1019 1020
BGSET 19001002190001071900
What I am aiming for (i.e. the inverse):
BGADD 1000100010011000
+ 30001002300010035000
+ 91016 91017 9
+ 91024 91025 9
BGADD 1000100010011000
BGADD 1000100010011000
+ 30001002300010035000
+ 19001006290010013900
BGADD 1000100010011000
+ 30001002300010035000
+ 19001006290010013900
BGADD 1000100010011000
+ 30001002300010035000
+ 19001006290010013900
BGADD 1000100010011000
Upvotes: 1
Views: 210
Reputation: 31
Turns out I think I have answered my own question- thanks to the suggestion from @Mako212 which provided the first part I hadn't thought of yet. The regex I have constructed finds lines that don't start with "BGADD" or "+" and then uses the same basis as the first regex in my question to match until it finds the next instance of "BGADD"
What I think is the answer:
^(?!BGADD|[+]).*$(\R|\z)(.*$(\R|\z))*?(?=BGADD)
>>>Edited to simplify and capture last line of document:
^(?!BGADD|[+])(.*$(\R|\z))+?(?=BGADD|\z)
Output:
BGADD 1000100010011000
+ 30001002300010035000
+ 91016 91017 9
+ 91024 91025 9
BGADD 1000100010011000
BGADD 1000100010011000
+ 30001002300010035000
+ 19001006290010013900
BGADD 1000100010011000
+ 30001002300010035000
+ 19001006290010013900
BGADD 1000100010011000
+ 30001002300010035000
+ 19001006290010013900
BGADD 1000100010011000
Upvotes: 2