pathikrit
pathikrit

Reputation: 33479

How to get the names of the regex named capturing group in a match in Java?

Given:

String text = "FACEBOOK is buying GOOGLE and FACE BOOK";

and:

Pattern pattern = Pattern.compile("(?<FB>(FACE(\\p{Space}?)BOOK))|(?<GOOGL>(GOOGL(E)?))");
Matcher matcher = pattern.matcher(text);

I want to get something like this:

Group=FB matches substring="FACEBOOK" at position=[0, 8)
Group=GOOGL matches substring="GOOGLE" at position=[19, 25)
Group=FB matches substring="FACE BOOK" at position=[30, 39)

However, I have been unable to get the group name. Here is my attempt in Scala:

import java.util.regex.Pattern
  val pattern = Pattern.compile("(?<FB>(FACE(\\p{Space}?)BOOK))|(?<GOOGL>(GOOGL(E)?))")
  val text = "FACEBOOK is buying GOOGLE and FACE BOOK"
  val matcher = pattern.matcher(text)

  while(matcher.find()) {
    println(s"Group=???? matches substring=${matcher.group()} at position=[${matcher.start},${matcher.end})")
  }

EDIT: Someone marked this as a duplicate of Get group names in java regex but this is a different question. This is asking given a MATCH, how to find the group name. The other question is asking how to get the group-name to String (or index) given a Pattern object.

Upvotes: 3

Views: 4532

Answers (3)

M. Justin
M. Justin

Reputation: 21258

Java 20 is adding the namedGroups method to MatchResult (which Matcher implements). This can be used to get the current match group name in your example.

Here is a Java implementation:

while(matcher.find()) {
    System.out.printf("Group=%s matches substring=%s at position=[%s,%s)%n",
            getCurrentGroupName(matcher), 
            matcher.group(), matcher.start(), matcher.end());
}
private static String getCurrentGroupName(Matcher matcher) {
    return matcher.namedGroups().keySet().stream()
            .filter(n -> matcher.group(n) != null)
            .findFirst().orElse(null);
}

Upvotes: 0

Thilo
Thilo

Reputation: 262724

You could use the named-regexp Java library. It is a thin wrapper around java.util.regex with named capture groups support, primarily for pre-Java-7 users, but it also contains the methods to inspect the group names (which appears to be missing even from Java 11):

Upvotes: 2

pathikrit
pathikrit

Reputation: 33479

Here is my attempt in Scala:

import java.util.regex.{MatchResult, Pattern}

class GroupNamedRegex(pattern: Pattern, namedGroups: Set[String]) {
  def this(regex: String) = this(Pattern.compile(regex), 
    "\\(\\?<([a-zA-Z][a-zA-Z0-9]*)>".r.findAllMatchIn(regex).map(_.group(1)).toSet)

  def findNamedMatches(s: String): Iterator[GroupNamedRegex.Match] = new Iterator[GroupNamedRegex.Match] {
    private[this] val m = pattern.matcher(s)
    private[this] var _hasNext = m.find()

    override def hasNext = _hasNext

    override def next() = {
      val ans = GroupNamedRegex.Match(m.toMatchResult, namedGroups.find(group => m.group(group) != null))
      _hasNext = m.find()
      ans
    }
  }
}

object GroupNamedRegex extends App {
  case class Match(result: MatchResult, groupName: Option[String])

  val r = new GroupNamedRegex("(?<FB>(FACE(\\p{Space}?)BOOK))|(?<GOOGL>(GOOGL(E)?))")
  println(r.findNamedMatches("FACEBOOK is buying GOOGLE and FACE BOOK FB").map(s => s.groupName -> s.result.group()).toList)
}

Upvotes: 1

Related Questions