Simon Hyttfors
Simon Hyttfors

Reputation: 33

Finding the index of a character to match in a specific regex

I have a String that begins with a word and I want to make a substring which starts at index 0 and ends at the index of the next special character (space, ., !, ?, etc...). How would I go about doing that with a regex? Can I get the index of the first regex match? And how would the pattern look?

Thanks in advance!

Upvotes: 0

Views: 4191

Answers (3)

hwnd
hwnd

Reputation: 70750

You could use the following.

^\w+(?=\W)

Explanation:

^            # the beginning of the string
\w+          # word characters (a-z, A-Z, 0-9, _) (1 or more times)
(?=          # look ahead to see if there is:
  \W         #   non-word characters (all but a-z, A-Z, 0-9, _)
)            # end of look-ahead

Example:

String s  = "foobar!";
Pattern p = Pattern.compile("^\\w+(?=\\W)");
Matcher m = p.matcher(s);

if (m.find()) {
  System.out.println("Start:" + m.start() + " End:" + m.end());
  System.out.println(m.group());
}

Upvotes: 2

M A
M A

Reputation: 72884

The following prints the substring that contains the word part in your string (a \w denotes a word characters including digits, while \W denotes a non-word character):

Pattern p = Pattern.compile("(\\w+)[\\W\\s]*");
Matcher matcher = p.matcher("word!,(. [&]");
if(matcher.find()) {
    System.out.println(matcher.group(1));
}

Output: word

Upvotes: 1

arshajii
arshajii

Reputation: 129572

How would I go about doing that with a regex?

You can try something like this:

^.*?\p{Punct}
  • ^ matches start of string
  • .*? matches anything reluctantly
  • \p{Punct} matches one of !"#$%&'()*+,-./:;<=>?@[]^_`{|}~

Can I get the index of the first regex match?

In general, you can obtain the indices of regex matches with Matcher#start.

Upvotes: 1

Related Questions