Praveen
Praveen

Reputation: 627

Regex Pattern Need Help Java toString() method

I have a java toString on code generated from XML . We as a company are logging the toString() to logs and I am having trouble making a good regex to mask all the data effectively . Here is the sample to String

String input="com.example.sensitive.info.UserInfo@15b1534[name=User1, clientName=HARVARD LAW SCHOOL, THE, clientId=12345]";

expected output

com.example.sensitive.info.UserInfo@15b1534[name=User1, clientName=****************, clientId=12345]

Can someone help me with a regex that will mask everything up until the last comma(,) before the next equal =

here is what I tried

maskPatterns.add("clientName=(.*?)=");

This ends up masking till next = . I cant seem to figure how to have it backtrack to last comma(,) before next equal(=).

Also if anyone has better regex for it I am all ears

Upvotes: 4

Views: 202

Answers (3)

The fourth bird
The fourth bird

Reputation: 163362

According to your example of maskPatterns.add("clientName=(.*?)="); I assume that you want the value in capture group 1.

If it should be agnostic of the square brackets for marking the end of the value, but you don't want to match them either, you might use:

\bclientName=([^\r\n,=\[\]]+(?:,(?!\h*\w+=)[^\r\n,=\[\]]*)*)

Explanation

  • \bclientName= A word boundary, then match clientName=
  • ( Capture group 1
    • [^\r\n,=\[\]]+ Match 1+ times any char except , = [ ] or a newline
    • (?: Non capture group
      • ,(?!\h*\w+=) Match a comma asserting what is directly to the right is not 0+ horizontal whitespace chars, 1+ word chars and an = sign
      • [^\r\n,=\[\]]* Optionally match any char except a newline , = [ ]
    • )* Close non capture group and repeat 0+ times to get all occurrences of a comma
  • ) Close group 1

Regex demo

If the [ and ] can also be part of the clientName, you can omit them from the character classes.

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521379

Use String#replaceAll here:

String input = "com.example.sensitive.info.UserInfo@15b1534[name=User1, clientName=HARVARD LAW SCHOOL, THE, clientId=12345]";
String output = input.replaceAll("\\bclientName=.*?(\\s*)(?=\\w+=|\\])", "clientName=****************$1");
System.out.println(input);
System.out.println(output);

This prints:

com.example.sensitive.info.UserInfo@15b1534[name=User1, clientName=HARVARD LAW SCHOOL, THE, clientId=12345]
com.example.sensitive.info.UserInfo@15b1534[name=User1, clientName=**************** clientId=12345]

Note that the number of asterisks probably should not exactly match the number of original characters in the clientName. Doing so would actually be partially revealing the original content, insofar that it would reveal at least the original length of the clientName string.

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626893

You can use

clientName=(.*?)(?=\s*,\s*\w+=|\])

See the regex demo

Details

  • clientName= - a literal string
  • (.*?) - Group 1: any zero or more chars other than line break chars as few as possible
  • (?=\s*,\s*\w+=|\]) - a positive lookahead that requires either ] (\] or (|) a comma enclosed with zero or more whitespaces on both ends (\s*,\s*), then one or more word chars and = immediately to the right of the current location.

Or, if you need the same amount of asterisks, use

String result = text.replaceAll("(\\G(?!^)|clientName=).(?=.*?,\\s*\\w+=|\\])", "$1*");

See this regex demo.

Details

  • (\\G(?!^)|clientName=)
  • . - any char but a line break char
  • (?=.*?,\s*\w+=|\]) - up to the first occurrence of
    • .*?,\s*\w+= - any zero or more chars other than line break chars as few as possible, a comma, zero or more whitespaces, one or more word chars and a =
    • | - or
    • \] - a ] char.

Upvotes: 1

Related Questions