Reputation: 3202
I have the following pattern:
Pattern TAG = Pattern.compile("(<[\\w]+]>)|(</[\\w]+]>)");
Please note the | in the pattern.
And I have a method that does some processing with this pattern
private String format(String s){
Matcher m = TAG.matcher(s);
StringBuffer sb = new StringBuffer();
while(m.find()){
//This is where I need to find out what part
//of | (or) matched in the pattern
// to perform additional processing
}
return sb.toString();
}
I would like to perform different functions depending on what part of the OR matched in the regex. I know that I can break up the pattern into 2 different patterns and match on each but that is not the solution I am looking for because my actual regex is much more complex and the functionality I am trying to accomplish would work best if I can do it in a single loop & regex. So my question is that:
Is there a way in java for finding out which part of the OR matched in the regex?
EDIT
I am also aware of the m.group() functionality. It does not work for my case. The example below
prints out <TAG>
and </TAG>
So for the first iteration of the loop it matches on <[\\w]+>
and second iteration it matches on </[\\w]+>
. However I need to know which part matched on each iteration.
static Pattern u = Pattern.compile("<[\\w]+>|</[\\w]+>");
public static void main(String[] args) {
String xml = "<TAG>044453</TAG>";
Matcher m = u.matcher(xml);
while (m.find()) {
System.out.println(m.group(0));
}
}
Upvotes: 2
Views: 123
Reputation: 124225
You don't have to use []
with \\w
since it is already a class. Also you can surround every option of OR part in with parenthesis go be able to use them as groups (if one of the group will not be found it will have null reference). So your code can look like this:
static Pattern u = Pattern.compile("(<\\w+>)|(</\\w+>)");
public static void main(String[] args) {
String xml = "<TAG>044453</TAG>";
Matcher m = u.matcher(xml);
while (m.find()) {
if (m.group(1)!=null){// <- group 1 (<\\w+>)
System.out.println("I found <...> tag: "+m.group(0));
}else{ // if it wasn't (<\\w+>) then it means it had to be (</\\w+>) that was mathced
System.out.println("I found </...> tag: "+m.group(0));
}
}
}
You can also change pattern a little into <(/?)\\w+>
making /
part optional and placing it in parenthesis (which in this case will make it group 1). This way if tag will not have /
then group 1 will contain only empty String ""
so you can change logic to something like
if ("".equals(m.group(1))) {//
System.out.println("I found <...> tag: " + m.group(0));
} else {
System.out.println("I found </...> tag: " + m.group(0));
}
Upvotes: 0
Reputation: 5737
Take a look at the group()
method on Matcher
, you can do something like this:
if (m.group(1) != null) {
// The first grouped parenthesized section matched
}
else if (m.group(2) != null) {
// The second grouped parenthesized section matched
}
EDIT: reverted to original group numbers - the extra parens were not needed. This should work with a pattern like:
static Pattern TAG = Pattern.compile("(<[\\w]+>)|(</[\\w]+>)");
Upvotes: 1
Reputation: 36339
You should rewrite your patterns by factoring out common parts:
xy|xz => x(y|z)
xy|x => xy?
yx|x => y?x
Then, by putting interesting parts like y?
in parentheses you can check whether it is set or not with group().
Upvotes: 0