BSingh
BSingh

Reputation: 451

java regular expression

I am trying to write a regular expression for somethin like

s1 = I am at Boston at Dowtown
s2 = I am at Miami

I am interested in the words after at eg: Boston, Downtown, Miami

I have not been successful in creating a regex for that. Somethin like

> .*? (at \w+)+.*

gives just Boston in s1 (Downtown is missed). it just matches the first "at" Any suggestions

Upvotes: 4

Views: 227

Answers (2)

Alan Moore
Alan Moore

Reputation: 75222

You seem to expect (at \w+)+ to match both at Boston and at Downtown in the first string. That doesn't work because you don't allow for the space before the second at. You would need to change it to ( at \w+)+--or better, change that to a non-capturing group and use the capturing group for the part that really interests you:

Pattern p = Pattern.compile(".*?(?: at (\\w+))+.*");
String s1 = "I am at Boston at Downtown";
Matcher m = p.matcher(s1);
if (m.matches()) {
    System.out.println(m.group(1));
}

But now it only prints Downtown. That's because you're trying to use one capturing group to capture two substrings. The first time (?: at (\\w+))+ matches, it captures Boston; the second time, it discards Boston and captures Downtown instead.

There are some regex flavors that will let you retrieve intermediate captures (Boston in this example), but Java is not one of them. Your best option is probably to use find() instead of matches(), as @arclight suggested. That makes the regex simpler, too:

Pattern p = Pattern.compile("\\bat\\s+(\\w+)");
String s1 = "I am at Boston at Downtown";
Matcher m = p.matcher(s1);
while (m.find()) {
    System.out.println(m.group(1));
}

You don't have to match the space before at any more, but you probably want to use the \b (word boundary) to avoid partial-word matches (e.g., My cat is at Boston at Downtown). And it's usually a good idea to use \s+ instead of a literal space, in case there are multiple spaces, or the space is really a TAB or something.

Upvotes: 2

arclight
arclight

Reputation: 5310

Try this

 at\s+(\w+)

The complete code snippet would be

Pattern myPattern = Pattern.compile("at\\s+(\\w+)", Pattern.DOTALL, Pattern.CASE_INSENSITIVE);
Matcher m = myPattern.matcher(yourString);

while(m.find()) {
  String word = m.group(1);
}

Upvotes: 7

Related Questions