Reputation: 1986
I tried to parse a rule using java and read whatever inside using RegEx, but since I am very new to RegEx, I found several problem.
First, I try to parse a predicate with this RegEx (I don't know whether this is too complicated): "([a-zA-Z]+)\\(([\\?]?[a-zA-Z0-9]+)?(,[\\?]?[a-zA-Z0-9]+)*\\)"
, and I just found that this is completely wrong... The predicate should be something like this (I am too lazy to write the complete expression), p(), p(?a), p(?a,?b,c,?d)
. The predicate name has to be a string (contain alpha-character(s) only) and the arguments is a string contain alpha-character(s) only or began with ?
.
There are two problems here I found, given element p(a,b,c)
:
Matcher
), the results are only p(a,b,c)
, p
, a
, and ,c
, how could I retrieve the b
also?,
(comma sign) inside the group, note that the repetition should including it also?The other case, when I input p()
, why did it get a group in which the element is null
?
Any idea how to fix this?
Upvotes: 2
Views: 1736
Reputation: 75222
One of the "arg" values in your longest sample string is ?b?
, which doesn't seem to match your description. Remove that and your regex matches all the samples, but that still leaves you with the problem of extracting the individual arguments. The easiest way to do that in Java is to capture all the arguments as one string, then split that string to break out the individual arguments.
As @Tomalak said, your regex is pretty good; the only thing I can see wrong with it is the ?
after the group representing the first argument. It should control the whole argument string, not just the first argument. I mean, if there's no first argument, there's no point looking for a second, third, etc., is there? Here's how I would do it:
(?:[?]?[a-zA-Z0-9]+(?:,[?]?[a-zA-Z0-9]+)*)?
That will match nothing, or one argument, or several arguments separated by commas, but it won't match (for example) ,a
or ,?a,b
, as your regex does. Here's the full regex in the form of a Java string literal:
"([a-zA-Z]+)\\(((?:\\??[a-zA-Z0-9]+(?:,\\??[a-zA-Z0-9]+)*)?)\\)"
The predicate name is captured in group #1 and the arguments are captured in group #2. If there are no arguments, group #2 will contain an empty string (not a null
). Otherwise, you can break out the individual arguments by splitting it on commas.
BTW, you can escape most metacharacters with backslashes (\?
) or square brackets ([?]
); you don't need to do both. If it's only the one character (i.e., not part of a real character class like [!.?]
), I advise using backslashes. I know it's the same number of characters in Java, but I think the backslashes make it a little more self-documenting.
EDIT: Here's the code I used:
String[] inputs = { "p()", "p(?a)", "p(?a,?b,c,?d)", "p(a,b,c)" };
Pattern p = Pattern.compile(
"([a-zA-Z]+)\\(((?:\\??[a-zA-Z0-9]+(?:,\\??[a-zA-Z0-9]+)*)?)\\)");
for ( String s : inputs )
{
Matcher m = p.matcher(s);
if ( m.matches() )
{
System.out.printf("%nFull match: %s%nPredicate name:%n %s%n",
m.group(), m.group(1));
String allArgs = m.group(2);
if (allArgs.length() == 0)
{
System.out.println("No arguments");
}
else
{
System.out.println("Arguments:");
for (String arg : allArgs.split(","))
{
System.out.printf(" %s%n", arg);
}
}
}
}
Upvotes: 1
Reputation:
"The predicate should be something like this (I am too lazy to write the complete expression), p(), p(?a), p(?a,?b?,c,?d)."
I wanted to add a comment but ie6 is giving me trouble. If you give a better explanation, I will give you a solution.
What you are dealing with is text! Don't try to whitewash it as something more extravagant.
Being 'lazy' does not explain what p(), p(?a), p(?a,?b?,c,?d)
means. Every single text character/symbol must be fully understood.
Regex is powerful and can be extremely daunting. A regex formulae (abstraction) cannot be
inferred from an abstraction.
I'm sorry, I just can't understand the parameters. I'm going to delete my post...
(Apparently, I can't delete it. If someone could delete this for me, thanks!)
Upvotes: 0
Reputation: 33908
There are two problems here I found, given element p(a,b,c)
(?:,(\w+))
The other case, when I input p(), why did it get a group in which the element is null?
Because the groups that are supposed to match the "parameters" are not matched at all, thus not defined. This is how capturing groups work. You can pick/filter whine once you want after the match.
You want to use/construct a proper parser for this and not just use one regex.
Upvotes: 0