Peter Jurkovic
Peter Jurkovic

Reputation: 2896

How to match sequence in a negate set

Consider following expression:

((password|secret)(=|%3D%22))+([^&|\"|%22]*)

And value:

http://host?foo=bar&xml=%3C%3Fxml+id%3D%220abc987%22+password%3D%22secreT12aa5%22+binds%3D%222%22

The xml parameter contains encoded value <?xml id="0abc987" password="secreT12aa5" binds="2"

What I would like to achieve is match password="secreT12aa5" and then replace it with e.g. password="****"

This issue is that the given regular expression matches, only the sequence of string up to 2, this is because of value in a negate set %22. The percentage sign is being ignored.

How can I change the expression to match password%3D%22secreT12aa5 (whole password value?)

The expression should also match http://host?password=value. Which currently does.

enter image description here

I would like to use this regular expression also for replacements. And use replaceAll() method to actually strip a matching parameter value.

Soe the regex ((password)(=|%3D%22))([^&|\\"]*)(%22)? with replacements $1[PROTECTED]$5 automatically replaces:

password=VALUE 
to => 
password=[PROTECTED]

password=VALUE&secret=VALUE 
to => 
password=[PROTECTED]&secret=[PROTECTED]

http://host?foo=bar&xml=%3C%3Fxml+id%3D%220abc987%22+password%3D%22secreT12345%22+binds%3D%222%22 
to => 
http://host?foo=bar&xml=%3C%3Fxml+id%3D%220abc987%22+password%3D%22[PROTECTED]%22+binds%3D%222%22

Upvotes: 1

Views: 475

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626950

Note that [^&|\"|%22] is a negated character class that matches any char but &, | (yes, a pipe), ", % and 2 since inside the character class all the chars are treated separately, not as sequences.

You may use

password(?:="?|%3D%22)(?:(?!%22)[^&\"])*"?

See the regex demo

Details

  • password - a literal substring
  • (?:="?|%3D%22) - either = followed with an optional " or %3D%22
  • (?:(?!%22)[^&\"])* - any char but & and " ([^&\"]), 0 or more occurrences as many as possible (*), that does not start a %22 char sequence (a so called tempered greedy token).
  • "? - an optional ".

You may re-write the pattern using "unroll-the-loop" principle as

password(?:="?|%3D%22)[^&\"%]*(?:%(?!22)[^%&\"]*)*"?

See another demo.

Also, others would prefer a lazy pattern + lookahead with alternation approach:

password(?:="?|%3D%22)[^&\"]*?(?:(?=%22)|\"|$)

See yet another regex demo.

Upvotes: 2

Related Questions