Reputation: 139
I have this code:
private boolean setListOfRegex() {
try {
listRegex = new ArrayList<Pattern>();
tokens = new ArrayList<String[]>();
String[] regs = regexFile.split("\n");
for (int i = 0; i < regs.length; i++) {
String[] info = regs[i].split(";");
Pattern p = Pattern.compile(info[0]);
listRegex.add(p);
String[] a = { info[1], info[2] };
tokens.add(a);
}
return true;
} catch (Exception e) {
System.out.println(e.getMessage());
return false;
}
The array regs
is the array result of the regexFile
, where regexFile
is a .txt file like this one:
\\+|\\*|\\*\\*|\\-|\\=;OPERATOR;0
\\w+\s\\=;VARIABLE;1
\\(;OPEN_BRACKET;2
\\);CLOSE_BRACKET;3
[0-9]+;NUMBER;4
The function work fine, but the problem is the line:
Pattern p = Pattern.compile(info[0]);
When I change the info[0]
for "\\\\("
, it works fine, but with the variable doesn't, I print the info[0]
, and it shows the same string, so when I leave the variable show this error:
Unclosed group near index 3
\\\\(
It seems that took the parentheses like the parentheses of some regex expression. I think it's because the info[0]
should be "\\\\\\\\\("
. I try with that expression, but the error remains:
Unclosed group near index 5
\\\\(
How can save the expression "\\\\("
in info[0]
. The two expressions before this one:
\\+|\\*|\\*\\*|\\-|\\=;OPERATOR;0
\\w+\s\\=;VARIABLE;1
compile fine
Upvotes: 0
Views: 406
Reputation: 79015
When I change the info[0] for "\\(", it works fine, but with the variable doesn't
This is not correct.
Demo:
public class Main {
public static void main(String[] args) {
Pattern.compile("\\\\(");
}
}
Output:
Exception in thread "main" java.util.regex.PatternSyntaxException: Unclosed group near index 3
\\(
...
And as expected, the behaviour remains same with a variable
public class Main {
public static void main(String[] args) {
String str = "\\\\(";
Pattern.compile(str);
}
}
Output:
Exception in thread "main" java.util.regex.PatternSyntaxException: Unclosed group near index 3
\\(
...
The string, \\\\(
means a \
followed by an unescaped (
which the regex engine interprets as the start of a group. This is because \\\\
is interpreted as a single \
by the regex engine - one \
to escape \
and then \\
to escape the escaped \
which is like escaping [
with \\[
.
Thus, you need \\
to escape (
and therefore your string should be \\\\\\(
where \\\\
specifies an escaped \
and \\(
specifies an escaped (
.
Demo:
public class Main {
public static void main(String[] args) {
//...
Pattern.compile("\\\\\\(");
String str = "\\\\\\(";
Pattern.compile(str);
//....
}
}
Upvotes: 2
Reputation: 265161
The file content \\(
is equivalent to the Java string literal "\\\\("
, i.e. two (literal) backslashes followed by an opening parenthesis. To escape a meta character in a regular expression, it needs to be preceded by a single backslash. This includes the backslash character itself, which can be escaped with another preceding backslash.
If you cannot modify the file, then you have to remove the duplicated backslashes and replace them with a single backslash before compiling them as a regular expression:
// replace 2 literal backslash character with a single literal backslash character:
info[0] = info[0].replace("\\\\", "\\");
Upvotes: 0