Jim
Jim

Reputation: 19572

Do I always need to escape metacharacters in a string that is not a "literal"?

It seems that having a string that contains the characters { or } is rejected during regex processing. I can understand that these are reserved characters and I need to escape them so if I do:

string.replaceAll("\\" + pattern);

This works, where pattern is any string starting with {.

Question: Is there a way to avoid such problems with strings that already contain such metachars so that it is handled automatically? Seems to me it should be the same as adding a double quote in a string literal vs accepting a string as input that already has the double quote

Upvotes: 7

Views: 1109

Answers (4)

Pshemo
Pshemo

Reputation: 124245

TL;DR

  • if you need regex syntax use replaceAll or replaceFirst,
  • if you want your target/replacement pair to be treated as literals use replace (it also replaces all occurrences of your target).

Most people are confused by unfortunate naming of replacing methods in String class which are:

  • replaceAll(String, String)
  • replaceFirst(String, String)
  • replace(CharSequence, CharSequence)
  • replace(char, char)

Since replaceAll method explicitly claims that it replaces all posible targets, people assume that replace method doesn't doesn't guarantee such behaviour since it doesn't contain All suffix.
But this assumption is wrong.

Main difference between these methods is shown in this table:

╔═════════════════════╦═══════════════════════════════════════════════════════════════════╗
║                     ║                             replaced targets                      ║
║                     ╠════════════════════════════════════╦══════════════════════════════╣
║                     ║           ALL found                ║      ONLY FIRST found        ║
╠══════╦══════════════╬════════════════════════════════════╬══════════════════════════════╣
║      ║   supported  ║ replaceAll(String, String)         ║ replaceFirst(String, String) ║
║regex ╠══════════════╬════════════════════════════════════╬══════════════════════════════╣
║syntax║      not     ║ replace(CharSequence, CharSequence)║              \/              ║
║      ║   supported  ║ replace(char, char)                ║              /\              ║
╚══════╩══════════════╩════════════════════════════════════╩══════════════════════════════╝

Now if you don't need to use regex syntax use method which doesn't expect it, but it treats target and replacement as literals.

So instead of replaceAll(regex, replacement)

use replace(literal, replacement).


As you see there are two overloaded versions of replace. They both should work for you since they don't support regex syntax. Main difference between them is that:

  • replace(char target, char replacement) simply creates new string and fill it either with character from original string, or character you decided as replacement (depending if it was equal to target character)

  • replace(CharSequence target, CharSequence replacement) is essentially equivalent of replaceAll(Pattern.quote(target), Matcher.quoteReplacement(replacement.toString()) which means that it is same as replaceAll but (which means it internally uses regex engine) but it escapes regex metacharacters used in target and replacement for us automatically

Upvotes: 3

Jon Thoms
Jon Thoms

Reputation: 10759

You don't need any extra code, just the \Q and \E constructs, as documented in Java's Pattern class.

For example, in the following code:

String foobar = "crazyPassword=f()ob@r{}+";
Pattern regex = Pattern.compile("\\Q" + foobar "\\E");

the pattern would compile and foobar's special characters would not be interpreted as regex characters. See demo here.

The only thing that it won't match is where the input contains a literal \E. If you need to solve that problem too, just let me know in a comment and I'll edit to add that.

Upvotes: 0

John Kugelman
John Kugelman

Reputation: 361739

Use Pattern.quote(String):

public static String quote(String s)

Returns a literal pattern String for the specified String.

This method produces a String that can be used to create a Pattern that would match the string s as if it were a literal pattern.

Metacharacters or escape sequences in the input sequence will be given no special meaning.

Parameters:
    s - The string to be literalized
Returns:
    A literal string replacement
Since:
    1.5

Upvotes: 8

Fabian Damken
Fabian Damken

Reputation: 1517

You can use

java.util.regex.Pattern.quote(java.lang.String)

to escape meta characters used by regular expressions.

Upvotes: 4

Related Questions