Reputation: 92079
Problem
I am trying to to extract words from input
Pacific Gas & Electric (PG&E), San Diego Gas & Electric (SDG&E), Salt River Project (SRP), Southern California Edison (SCE)
I tried doing that online and my pattern (\w\s?&?\s?\(?\)?)
seems to work.
But when I write my Java program, it is not finding it
private static void findWords() {
final Pattern PATTERN = Pattern.compile("(\\w\\s?&?\\s?\\(?\\)?)");
final String INPUT = "Pacific Gas & Electric (PG&E), San Diego Gas & Electric (SDG&E), Salt River Project (SRP), Southern California Edison (SCE)";
final Matcher matcher = PATTERN.matcher(INPUT);
System.out.println(matcher.matches());
}
It returns False
Question
Pacific Gas & Electric (PG&E)
as match group1 and so onUpvotes: 1
Views: 315
Reputation: 31204
You might want to re-evaluate the output you're getting from rubular.
from Documentation
The matches method attempts to match the entire input sequence against the pattern.
What you have there in rubular finds a bunch of matches because just about every character is a match.
nowhere in your rubular result will it tell you that the entire string is a match though. I'd re-evaluate the results you're seeing there.
and a regular expression to match words is extremely simple
you can use
\b\S*\b
http://rubular.com/r/ljYs1xO1Qh
or simply
\S*
http://rubular.com/r/xgEuGse1lc
depending on your needs
Upvotes: 3
Reputation: 11818
Matcher#matches
returns only true if the whole string matches the regular expression.
As you can see in your online matcher, your regex matches not the whole string but a single character (sometimes a bit more). So your regex matches "P" and "a" and "c" and "i" and so on. You should fix your regex first and then use Matcher#find()
and Matcher#group()
to get the matching groups.
Upvotes: 2
Reputation: 39385
If you want to get the matches out of your string, here this is you can try:
final String INPUT = "Pacific Gas & Electric (PG&E), San Diego Gas & Electric (SDG&E), Salt River Project (SRP), Southern California Edison (SCE)";
Pattern pattern = Pattern.compile("(.*?\\([^)]+\\))(?:,\\s*|$)");
Matcher m = pattern.matcher(INPUT);
while (m.find()) {
System.out.println(m.group(1));
}
Alternately, you can do INPUT.split("\\s*,\\s*");
if the names doesn't contain any comma inside.
Now come to the question Why is there a mismatch, seems like my understanding is poor here
: Because the matches()
of String class perform matching over the whole string.
Upvotes: 0
Reputation: 213311
If you use Matcher#find()
method instead of Matcher#matches()
method, you'll get true
as outcome. The reason being, the matches()
method assumes implicit anchors - carat (^
) and dollar ($
) at the ends. So it would match the entire string with the regex. If that is not the case, it will return false
.
Upvotes: 4