dido
dido

Reputation: 3407

Why does this regular expression not return the second word

I have the following regex expression (code below), however I am very confused as to why it does not return 'Banas' since it is a word and is between 2 and 20 characters.

    Pattern p = Pattern.compile("\\s[A-Za-z]{2,20}\\s");
    Matcher m = p.matcher(" Derek Banas CA 1234 PA (750)555-1234");

    while(m.find()){
    System.out.println(m.group());
    }

The output is below. Why is "Banas" not in the output? Thanks.

Derek CA PA

Upvotes: 0

Views: 92

Answers (3)

ajb
ajb

Reputation: 31699

Using \\b (as @Pshemo answered) is probably the best answer for your problem. I wanted to mention another possibility, though: if you use lookahead, you can look for a space (or any other pattern) without consuming it.

Pattern p = Pattern.compile("\\s[A-Za-z]{2,20}(?=\\s)");

Now the pattern will match if the sequence of letters is followed by a space, but the space won't become part of the match, and it will remain in the string so that it can be matched by the next call to find(). The strings returned by m.group() will be " Derek", " Banas", " CA", " PA".

Upvotes: 1

Pshemo
Pshemo

Reputation: 124275

Because first match consumed space after Derek so Banas cant use it at start. Try maybe changing your regex to "\\b[A-Za-z]{2,20}\\b". \\b is word boundary which will match only places that are

  • before the first character in the string, if the first character is a word character.
  • after the last character in the string, if the last character is a word character.
  • between two characters in the string, where one is a word character and the other is not a word character.

Upvotes: 6

Laurence Quinn
Laurence Quinn

Reputation: 61

Because your regular expression has a space at the start and end of it. So when your regular expression finds " Derek ", the next part of your string is "Banas " and your regular expression would only work for " Banas ".

Upvotes: 3

Related Questions