Reputation: 6968
I'm trying to process a large build log looking for copy operations going to the wrong place. I'm just using Notepad++.
If I have a string like this:
Line 25672: Creating hard link to copy C:\DevDir\DERP\Output\x64\Release\someBin.dll to C:\DevDir\not\good\path\here\someBin.dll
and this
Line 25673: Creating hard link to copy C:\DevDir\not\good\path\here\someBin.dll to C:\DevDir\DERP\Output\x64\Release\someBin.dll
The special word here is DERP
to look for. Basically, I need to see when something in DERP
is being copied to a non-DERP location, and when a non-DERP location is copying to a DERP location.
So I need to find:
\scopy\s
, thenDERP
, then\sto\s
, then NOTDERP
to end of line
\scopy\s
, then NOTDERP
, then\sto\s
, thenDERP
to end of line
I've tried a couple variations of this to get the first one working. I thought I had the second working when swapping the negative lookaheads, but after manually scrolling through the aggregated results, I saw I was getting DERP
on the incorrect side of to
.
^.*? copy .*?DERP.*? to (?!DERP).*$
I can't use an answer the determines from solely the frequency of DERP
as relative paths may cause one side of to
to have multiple DERP
s. This relative path clause is where I'm having trouble accounting to.
EDIT:
Hm... did some tinkering and this is looking promising:
(^.*? copy )(?>(?:(?!DERP).)*? to).*?DERP.*?$
Upvotes: 0
Views: 89
Reputation: 6968
Okay, to find DERP and then not DERP:
(^.*? copy ).*?DERP.*? to (?>(?:(?!DERP).)*?$)
To find not DERP and then to find DERP:
(^.*? copy )(?>(?:(?!DERP).)*? to).*?DERP.*?$
So I guess the answer here is using this type of nesting:
(?>(?:(?!ThingToNotFind).))*?SomethingToFind
If there's a more elegant way to write or articulate this in general terms, I'll accept that answer. While I can use and modify this, I have a hard time looking at these nested patterns and groking them.
Upvotes: 2
Reputation: 2246
Maybe you should also use the path delimiters around DERP
to make sure you are really looking at complete folder names.
This follows the regex supposed by the edit of the question:
((^.*? copy )(?>(?:(?!DERP).)*? to).*?DERP.*?$|(^.*? copy ).*?(DERP).*? to (?>(?:(?!DERP).)*$))
Check out the playground: https://regex101.com/r/hX9aR4/1
NOTEPAD++ supports capture groups. I would therefore use this to get all the erroneous copying.
Find: (?:^(?:(?!\\DERP\\).)*$)|(?:^.* copy .*?(?:\\DERP\\).* to .*(?:\\DERP\\).*$)|(^.* copy .*\\DERP\\.*$)|(?:^.*$)
Replace: \1
(?:^(?:(?!\\DERP\\).)*$)
this matches lines not containing "\DERP\", but does not capture them.
(?:^.* copy .*?(?:\\DERP\\).* to .*(?:\\DERP\\).*$)
matches every log of a coppy from DERP to another DERP, but does not capture it.
Then (^.* copy .*\\DERP\\.*$)
matches and captures lines with one or more DERP folders, but since we already got rid of those with two or more in the first expression, we should be safe.
Then (?:^.*$)
matches all the other lines with a non capturing expression so they are replaced by nothing, as is the first non capturing part.
Check out the playground: https://regex101.com/r/nJ9iC4/3
Upvotes: 2