Reputation: 41
This is my first question, so please bear with me while I try and write this as neat and complete as possible!
I am trying to perform a find and replace in Notepad++ using regex, but I am getting some strange results that I do not understand. Can someone explain where I am going wrong and what I can do to achieve my desired outcome please?
I am using Notepad++ version 6.8.3
I have a number of log files where any customer information has to be redacted. I have to find the text Name:
and replace everything after it with *REDACTED*
This has to be done using a "Replace in Files". An example of the specific instances is below:
applicantDetailsCommand.firstName: Arnold
blah blah blah blah blah blah blah blah blah blah blah
applicantDetailsCommand.middleName: Judas
applicantDetailsCommand.lastName: Rimmer
blah blah blah blah blah blah blah blah
blah blah blah blah
applicantDetailsCommand.firstName: Dave
applicantDetailsCommand.middleName: Cinzano Bianco
applicantDetailsCommand.lastName: Lister
blah blah blah blah blah blah
blah blah blah
In order to do this I started searching using a look-behind thus:
(?<=Name: ).*$
which worked fine and found all of the entries after firstName, lastName, etc. However, in any file that did not contain "Name: ", the whole file matched including all of the lines, so I cannot use this in "Replace in Files" as it will just replace the whole file with "*REDACTED*".
Then I tried to match the string without using look-behind, so searched for (Name: ).*$
and was going to replace this with $1\*REDACTED\*
, which worked a treat, but I also discovered that it picked up several other lines, such as "host_name" and "URIName" that I did not want.
At this point I decided to use a group with only the alternative that I actually did want to match, so tried this:
(first|middle|last|account)Name: .*$
which started matching full files when neither accountName, firstName, middleName or lastName were present in the file.
I've read through lots of different articles on the tinterweb, but can't find anything that will explain why, when there is no match, the full file is matched.
Any help explaining this would be much appreciated.
Many thanks.
Upvotes: 1
Views: 1742
Reputation: 41
This is indeed a bug - after much searching I have eventually found this on GitHub: https://github.com/notepad-plus-plus/notepad-plus-plus/issues/683
This seems to only happen when using grouping and on files over a certain length. In one file, splitting it in two allowed me to find no matches of (?<=\d{8}(,|:) ).*?(?=>|\))
, while keeping the file whole resulted in the regex selecting the whole file. Strangely, though searching for (?<=\d{4}(,|:) ).*?(?=>|\))
in the same files worked ok - where the length of digits in the first grouping was 4 instead of 8, otherwise identical!
Also, in another search I modified one of the original regexs in my question from (first|middle|last|account)Name: .*$
to (first|middle|last)Name: .*$
and that also started working, as did keeping the same regex and halfing the length of the file. I also tried (rst|dle|ast|unt)Name: .*$
, which failed and (first|middle|account)Name: .*$
, which worked, all of which is pretty random and can't be tied down to any one thing being the problem.
This leads me to believe that there is a fundamental problem in the regex engine and as a result we are now ditching Notepad++ as a solution and are purchasing something else instead as the regex engine cannot be relied upon to be correct.
Hope that helps someone.
Upvotes: 2
Reputation:
Lose the $
because that means usually end of string unless multi-line mode.
However, you don't need multi-line mode. And you have to turn OFF
dot means all characters, so it will match anything but line breaks.
Lastly, using (?<=Name: ).*$
could result in backtracking issues if Name:
is not found.
Lose the lookbehind, make it Find: Name:\h.*
Replace Name: REDACTED
Upvotes: 0
Reputation: 8937
Find what: Name: .*
Replace with: Name: *REDACTED*
This method should work even if you add the group (first|middle|last|account)
before the Find what
pattern. Refer to the gifs below for the exact settings: (I'm using version 6.8.6 by the way)
What happens when file contains your search:
And what happens when it doesn't:
Upvotes: 0
Reputation: 4992
Make sure you disable the checkbox ". finds \r and \n" in the Search and replace window.
Upvotes: 0