Reputation: 1289
I have tried this reg. expression in order to retrieve an email address. As i have little experience with that, i would like to ask you if you know what's wrong with it, since it doubles one word:
regexp = "(\\w+)(\\(at\\))((\\w+\\.)+)([a-z]{2,3})";
Supposing i have an input "madrugada(at)yahoo.co.uk", it gives out as a result [email protected] .
pattern = Pattern.compile (regexp);
m = pattern.matcher (my_input);
while (m.find()) {
for (int i=0; i<=m.groupCount(); i++)
// it would give out: madrugada (at) yahoo co co uk
}
Thank you
Upvotes: 1
Views: 102
Reputation: 11815
You also don't really want to include m.group(0), as it contains the whole segment that matched your overall RE.
for (int i=1;i<=m.groupCount();i++) {
System.out.println(m.group(i));
}
Upvotes: 1
Reputation: 11110
import java.util.regex.*;
String a="madrugada(at)yahoo.co.in.ro.uk";
String regexp="(\\w+)(\\(at\\))(\\w+)((?:\\.\\w+)*)(\\.[a-z]{2,3})";
Pattern pattern = Pattern.compile (regexp);
Matcher m = pattern.matcher (a);
while (m.find()) {
for (int i=0; i<=m.groupCount(); i++)
println m.group(i);
}
produces following output:
madrugada(at)yahoo.co.in.ro.uk
madrugada
(at)
yahoo
.co.in.ro
.uk
EDIT:
Updated the above with a non capturing group. The reason that it did not work before is even though it matched multiple .\w+
patterns, the backreference was only to the last one. Also changed the non capturing group to *
for accomodate madrugada(at)yahoo.uk
Upvotes: 1
Reputation: 2115
You have an extra set of parentheses in your regex. When you loop through the capture groups, both of the capture groups (one of which is inside the other) are returned, duplicating the output since they captured the same thing.
Try this
regexp = "(\\w+)(\\(at\\))(\\w+\\.)+([a-z]{2,3})";
Edit: An alternate RegEx that uses non-capturing groups seems like it would solve the problem.
regexp = "(\\w+)(\\(at\\))((?:\\w+\\.)+)([a-z]{2,3})";
Upvotes: 3