Reputation: 451
I am trying to write a regular expression for somethin like
s1 = I am at Boston at Dowtown
s2 = I am at Miami
I am interested in the words after at eg: Boston, Downtown, Miami
I have not been successful in creating a regex for that. Somethin like
> .*? (at \w+)+.*
gives just Boston in s1 (Downtown is missed). it just matches the first "at" Any suggestions
Upvotes: 4
Views: 227
Reputation: 75222
You seem to expect (at \w+)+
to match both at Boston
and at Downtown
in the first string. That doesn't work because you don't allow for the space before the second at
. You would need to change it to ( at \w+)+
--or better, change that to a non-capturing group and use the capturing group for the part that really interests you:
Pattern p = Pattern.compile(".*?(?: at (\\w+))+.*");
String s1 = "I am at Boston at Downtown";
Matcher m = p.matcher(s1);
if (m.matches()) {
System.out.println(m.group(1));
}
But now it only prints Downtown
. That's because you're trying to use one capturing group to capture two substrings. The first time (?: at (\\w+))+
matches, it captures Boston
; the second time, it discards Boston
and captures Downtown
instead.
There are some regex flavors that will let you retrieve intermediate captures (Boston
in this example), but Java is not one of them. Your best option is probably to use find()
instead of matches()
, as @arclight suggested. That makes the regex simpler, too:
Pattern p = Pattern.compile("\\bat\\s+(\\w+)");
String s1 = "I am at Boston at Downtown";
Matcher m = p.matcher(s1);
while (m.find()) {
System.out.println(m.group(1));
}
You don't have to match the space before at
any more, but you probably want to use the \b
(word boundary) to avoid partial-word matches (e.g., My cat is at Boston at Downtown). And it's usually a good idea to use \s+
instead of a literal space, in case there are multiple spaces, or the space is really a TAB or something.
Upvotes: 2
Reputation: 5310
Try this
at\s+(\w+)
The complete code snippet would be
Pattern myPattern = Pattern.compile("at\\s+(\\w+)", Pattern.DOTALL, Pattern.CASE_INSENSITIVE);
Matcher m = myPattern.matcher(yourString);
while(m.find()) {
String word = m.group(1);
}
Upvotes: 7