Reputation: 509
I'm having trouble returning a word position using regex and matcher methods in java.
Let's say i have a sentence "The quick brown fox jumps over the laziest dog in the world" and in my current regex i want to return the position of a particular word.
Let's say the input is "brown" and from the example above, it should return 3 which is the 3rd word from the sentence. If it's "quick" it should return 2 which the 2nd word in the sentence. If it's "world" then should return 12. I hope i have given enough examples.
My try is
Pattern p= Pattern.compile("(?i)(?<=^|[^A-Z0-9a-z])enemy(?=$|[^A-Z0-9a-z])");
Matcher m = p.matcher("The quickman is an enemy from megaman.");
if(m.find()){
System.out.println(m.start());
System.out.println(m.end());
System.out.println(m.group());
}
But the matcher.start() returns only the index of the string which is 16 and not the position of the word. Any hint or help would be appreciated.
Upvotes: 1
Views: 1616
Reputation: 42041
Here is an example for the word brown
:
\b(?:(brown)|(\S+))\b
// \b(?:(brown)|(\S+))\b
//
// Options: Case sensitive; Exact spacing; Dot doesn’t match line breaks; ^$ don’t match at line breaks; Default line breaks
//
// Assert position at a word boundary (position preceded or followed—but not both—by a Unicode letter, digit, or underscore) «\b»
// Match the regular expression below «(?:(brown)|(\S+))»
// Match this alternative (attempting the next alternative only if this one fails) «(brown)»
// Match the regex below and capture its match into backreference number 1 «(brown)»
// Match the character string “brown” literally (case sensitive) «brown»
// Or match this alternative (the entire group fails if this one fails to match) «(\S+)»
// Match the regex below and capture its match into backreference number 2 «(\S+)»
// Match a single character that is NOT a “whitespace character” (ASCII space, tab, line feed, carriage return, vertical tab, form feed) «\S+»
// Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
// Assert position at a word boundary (position preceded or followed—but not both—by a Unicode letter, digit, or underscore) «\b»
Example program to find brown:
import java.lang.Math;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.util.regex.PatternSyntaxException;
public class HelloWorld
{
public static void main(String[] args)
{
Integer counter = new Integer(0);
String subjectString = "The quick brown fox jumps over the laziest dog in the world";
String testWordString = "brown";
try {
Pattern regex = Pattern.compile("\\b(?:(brown)|(\\S+))\\b");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
// here increment a count for each word we pass.
counter++;
// matched text: regexMatcher.group()
// match start: regexMatcher.start()
// match end: regexMatcher.end()
System.out.println(regexMatcher.group());
// if the word text `regexMatcher.group()` matches our subject word `brown` exit the loop.
if (testWordString.equals(regexMatcher.group())) {
System.out.println("found the word: " + counter);
break;
}
}
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}
}
}
This outputs:
The
quick
brown
found the word: 3
Note the example can be simplified to remove the explicit test for brown
from:
\b(?:(brown)|(\S+))\b
to:
\b(\S+)\b
But my thought process was to allow you to use different regular expression capturing groups to indicate if you had found your match rather than using a string comparison brown
each time again.
I'll leave that as an exercise for you.
Upvotes: 2