Madrugada
Madrugada

Reputation: 1289

Java Regex strange thing

I have tried this reg. expression in order to retrieve an email address. As i have little experience with that, i would like to ask you if you know what's wrong with it, since it doubles one word:

regexp = "(\\w+)(\\(at\\))((\\w+\\.)+)([a-z]{2,3})";

Supposing i have an input "madrugada(at)yahoo.co.uk", it gives out as a result [email protected] .

pattern = Pattern.compile (regexp);
m = pattern.matcher (my_input);
while (m.find()) {
    for (int i=0; i<=m.groupCount(); i++)
         // it would give out: madrugada (at) yahoo co co uk
}

Thank you

Upvotes: 1

Views: 102

Answers (3)

Matt
Matt

Reputation: 11815

You also don't really want to include m.group(0), as it contains the whole segment that matched your overall RE.

for (int i=1;i<=m.groupCount();i++) {
  System.out.println(m.group(i));
}

Upvotes: 1

Alex
Alex

Reputation: 11110

import java.util.regex.*;
String a="madrugada(at)yahoo.co.in.ro.uk";
String regexp="(\\w+)(\\(at\\))(\\w+)((?:\\.\\w+)*)(\\.[a-z]{2,3})";
Pattern pattern = Pattern.compile (regexp);
Matcher m = pattern.matcher (a);
while (m.find()) {
    for (int i=0; i<=m.groupCount(); i++)
         println m.group(i);
}

produces following output:

madrugada(at)yahoo.co.in.ro.uk
madrugada
(at)
yahoo
.co.in.ro
.uk

EDIT:

Updated the above with a non capturing group. The reason that it did not work before is even though it matched multiple .\w+ patterns, the backreference was only to the last one. Also changed the non capturing group to * for accomodate madrugada(at)yahoo.uk

Upvotes: 1

John Haager
John Haager

Reputation: 2115

You have an extra set of parentheses in your regex. When you loop through the capture groups, both of the capture groups (one of which is inside the other) are returned, duplicating the output since they captured the same thing.

Try this

regexp = "(\\w+)(\\(at\\))(\\w+\\.)+([a-z]{2,3})";

Edit: An alternate RegEx that uses non-capturing groups seems like it would solve the problem.

regexp = "(\\w+)(\\(at\\))((?:\\w+\\.)+)([a-z]{2,3})";

Upvotes: 3

Related Questions