Reputation: 375
I need to validate the below string using regular expression in Java:
String alphanumericList ="[\"State\"; \"districtOne\";\"districtTwo\"]";
I have tried the following:
String pattern="^\\[ (\"[\\w]\")\\s+(?:\\s+;\\s+ (\"[\\w]\")+) \\]$";
String alphanumericList ="[\"State1\"; \"district1\";\"district2\"]";
But the validation fails. Any help is appreciated.
Upvotes: 0
Views: 389
Reputation:
Try this
static final String HEAD = "^\\[\\s*";
static final String TAIL = "\\s*\\]$";
static final String SEP = "\\s*;\\s*";
static final String ITEM = "\"[^\"]*\"";
static final String PAT = HEAD + ITEM + "(" + SEP + ITEM + ")*" + TAIL;
Upvotes: 0
Reputation: 89584
You are looking for this pattern:
String pattern = "\\[\\s*\"[^\"]*\"\\s*(?:;\\s*\"[^\"]*\"\\s*)*+\\]";
No need to add anchors since there are implicit if you use the matches()
method since this method is the more appropriate for validation tasks.
pattern details:
\\[ # a literal opening square bracket
\\s* # optional whitespaces
\" # literal quote
[^\"]* # content between quotes: chars that are not a quote (zero or more)
\"
\\s*
(?: # non-capturing group:
; # a literal semi-colon
\\s*
\" # quoted content
[^\"]*
\"
\\s*
)*+ # repeat this group zero or more time (with a possessive quantifier)
\\] # a literal closing square bracket
The possessive quantifier prevent the regex engine to backtrack into repeated non-capturing groups if the closing square bracket is not present. It is a security to prevent uneeded backtracking and to make the pattern fail faster. Not that you can make possessive other quantifiers too before the non-capturing group for the same reason. More about possessive quantifiers.
I decided to describe the content between quotes in this way: \"[^\"]*\"
, but you can be more restrictive, allowing for example only words characters: \"\\w*\"
or more general, allowing escaped quotes: \"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*+\"
Upvotes: 0
Reputation: 88737
I'll try and mark the possible issues with your expression (issue numbers above the chars):
1 4 2 3 1 4 5 1
"^\\[ (\"[\\w]\")\\s+(?:\\s+;\\s+ (\"[\\w]\")+) \\]$"
As you can see, there are at least 5 issues:
\\s+
), which the input doesn't seem to contain. You probably want to remove that or change the quantifier from +
to *
.(\"[\\w]\")+
means "a double quote, a single word character, a double quote" and all at least once. Besides that, \w
is already a character class, you the brackets around that are not needed here (unless you want to add more classes or characters inside). You probably want (\"\\w+\")
instead.(?:\\s+;\\s+ (\"[\\w]\")+)
) doesn't have a quantifier, i.e. it would be expected exactly once. You probably want to put the quantifier +
or *
after that group.Another point that's not a direct issue is the capturing group around \"[\\w]\"
. Since you seem to want to match multiple strings after semicolons you'd only be able to capture one of the matching groups. Hence you'd most probably not be able to do what you intended anyways and thus the group is not necessary.
That said the fixed original expression would look like this:
pattern = "^\\[(\"\\w+\")(?:\\s*;\\s+\"\\w+\")+\\]$"
Upvotes: 2