Reputation: 97
I have a block of text that has information encoded as follows:
[tag 1] some text [tag 2] more text [tag 3] even more text
I am in the process of creating a regular expression in Java that will extract encoded information into separate strings. Such as:
[tag 1] some text
[tag 2] more text
[tag 3] even more text
The regular expression that I have created is (for regular pattern matching): “([.+?][^[]+)”
This regular expression works well in Notepad++ and two online-tools:
In Java this regular expression statement produces a runtime exception:
Pattern pattern = Pattern.compile(“(\\[.+?\\][^[]+)”);
Exception details:
Exception in thread "main" java.util.regex.PatternSyntaxException: Unclosed character class near index 13
(\[.+?\][^[]+)
^
Do I have to escape the negated "[" within the character class? If yes how?
Upvotes: 2
Views: 151
Reputation: 44376
The Java implementation seems to have a bug.
Normally, regex does not require you to escape it, but try escaping it anyway.
(\[.+?\][^\[]+)
"(\\[.+?\\][^\\[]+)"
It could be considered good practice to escape special characters, even if they don't need to be. It also helps avoid bugs like this.
Upvotes: 0
Reputation: 208545
Escape the [
within the negated character class. Although this shouldn't be necessary inside of a character class, clearly Java is having an issue with it, and it does not change the meaning of the character class to escape characters that shouldn't have a special meaning within a character class.
Try the following:
(\[.+?\][^\[]+)
Or for the Java code:
Pattern pattern = Pattern.compile(“(\\[.+?\\][^\\[]+)”);
Upvotes: 3
Reputation: 9937
You need to escape the square bracket it just like you escaped them earlier:
(\\[.+?\\][^\\[]+)
The runtime exception is being caused because the RegEx parser sees [^[] as having an unclosed bracket.
Upvotes: 1