kayleeFrye_onDeck
kayleeFrye_onDeck

Reputation: 6968

Only find one match in a specific location in a string

I'm trying to process a large build log looking for copy operations going to the wrong place. I'm just using Notepad++.

If I have a string like this:

Line 25672: Creating hard link to copy C:\DevDir\DERP\Output\x64\Release\someBin.dll to C:\DevDir\not\good\path\here\someBin.dll

and this

Line 25673: Creating hard link to copy C:\DevDir\not\good\path\here\someBin.dll to C:\DevDir\DERP\Output\x64\Release\someBin.dll

The special word here is DERP to look for. Basically, I need to see when something in DERP is being copied to a non-DERP location, and when a non-DERP location is copying to a DERP location.

So I need to find:

\scopy\s, then DERP, then \sto\s, then NOT DERP to end of line

\scopy\s, then NOT DERP, then \sto\s, then DERP to end of line

I've tried a couple variations of this to get the first one working. I thought I had the second working when swapping the negative lookaheads, but after manually scrolling through the aggregated results, I saw I was getting DERP on the incorrect side of to.

^.*? copy .*?DERP.*? to (?!DERP).*$

I can't use an answer the determines from solely the frequency of DERP as relative paths may cause one side of to to have multiple DERPs. This relative path clause is where I'm having trouble accounting to.

EDIT:

Hm... did some tinkering and this is looking promising:

(^.*? copy )(?>(?:(?!DERP).)*? to).*?DERP.*?$

Upvotes: 0

Views: 89

Answers (2)

kayleeFrye_onDeck
kayleeFrye_onDeck

Reputation: 6968

Okay, to find DERP and then not DERP:

(^.*? copy ).*?DERP.*? to (?>(?:(?!DERP).)*?$)

To find not DERP and then to find DERP:

(^.*? copy )(?>(?:(?!DERP).)*? to).*?DERP.*?$

So I guess the answer here is using this type of nesting:

(?>(?:(?!ThingToNotFind).))*?SomethingToFind

If there's a more elegant way to write or articulate this in general terms, I'll accept that answer. While I can use and modify this, I have a hard time looking at these nested patterns and groking them.

Upvotes: 2

fabianegli
fabianegli

Reputation: 2246

Maybe you should also use the path delimiters around DERP to make sure you are really looking at complete folder names.

Solution I

This follows the regex supposed by the edit of the question:

((^.*? copy )(?>(?:(?!DERP).)*? to).*?DERP.*?$|(^.*? copy ).*?(DERP).*? to (?>(?:(?!DERP).)*$))

Check out the playground: https://regex101.com/r/hX9aR4/1

Solution II

NOTEPAD++ supports capture groups. I would therefore use this to get all the erroneous copying.

Find: (?:^(?:(?!\\DERP\\).)*$)|(?:^.* copy .*?(?:\\DERP\\).* to .*(?:\\DERP\\).*$)|(^.* copy .*\\DERP\\.*$)|(?:^.*$)

Replace: \1

Explanatoin

(?:^(?:(?!\\DERP\\).)*$) this matches lines not containing "\DERP\", but does not capture them.

(?:^.* copy .*?(?:\\DERP\\).* to .*(?:\\DERP\\).*$) matches every log of a coppy from DERP to another DERP, but does not capture it.

Then (^.* copy .*\\DERP\\.*$) matches and captures lines with one or more DERP folders, but since we already got rid of those with two or more in the first expression, we should be safe.

Then (?:^.*$) matches all the other lines with a non capturing expression so they are replaced by nothing, as is the first non capturing part.

Check out the playground: https://regex101.com/r/nJ9iC4/3

Upvotes: 2

Related Questions