Reputation: 5533
The answers here suggesting to use Pattern.quote
in order to escape the special regex characters.
The problem with Pattern.quote
is it escapes the string as a whole, not each of the special character on its own.
This is my case:
I receive a string from the user, and need to search for it in a document.
Since the user can't pass new line characters (It's a bug in a 3rd party API I have no access to), I decieded to treat any whitespace sequence as "\s+" and use a regex to search the document. This way the user can send a simple whitespace instead of a newline character.
For instance, if the document is:
The \s metacharacter is used to find a whitespace character.
A whitespace character can be:
A space character
A tab character
A carriage return character A new line character
A vertical tab character
A form feed character
Then the received string
String receivedStr = "The \s metacharacter is used to find a whitespace character. A whitespace character can be:";
should be found in the document.
To acheive this I want to quote the string, and then replace any whitespace sequence with the string "\s+".
Using the following code:
receivedStr = Pattern.quote(receivedStr).replaceAll("\\s+", "\\\\s+");
yield the regex:
\QThe\s+\s\s+metacharacter\s+is\s+used\s+to\s+find\s+a\s+whitespace\s+character.\s+A\s+whitespace\s+character\s+can\s+be:\E
that will ofcourse ignore my added "\s+"
's instead of the expected:
The\s+\\s\s+metacharacter\s+is\s+used\s+to\s+find\s+a\s+whitespace\s+character.\s+A\s+whitespace\s+character\s+can\s+be:
that only escapes the "\s" literal and not the entire string.
Is there an alternative to Pattern.quote
that escapes single literals instead of the whole string?
Upvotes: 1
Views: 342
Reputation: 421360
I would suggest something like this:
String re = Stream.of(input.split("\\s+"))
.map(Pattern::quote)
.collect(Collectors.joining("\\s+"));
This makes sure everything gets quoted (including stuff that otherwise would be interpreted as look-arounds and could cause exponential blowup in match finding), and any user entered whitespace ends up as unquoted \s+
.
Example input:
Lorem \\b ipsum \\s dolor (sit) amet.
Output:
\QLorem\E\s+\Q\b\E\s+\Qipsum\E\s+\Q\s\E\s+\Qdolor\E\s+\Q(sit)\E\s+\Qamet.\E
Upvotes: 2