Reputation: 25

What does this regular expression mean: \\d{3}-\\d{4}.*

Please notice the spaces in the regx! Anyway thank everybody who try to contribute. With spaces, it is really challenging I guess.

I saw the current code has the following:

Perl5Compiler compiler = new Perl5Compiler();
Perl5Matcher matcher = new Perl5Matcher();
Pattern pattern = compiler.compile("\\ d{ 3 } -\\d{4}.* "); // pattern for string starting with " 00 0 - 00 0 0 "   

if (matcher.matches(Num, pattern)) {  
    return true;  
}

However, I don't feel right that the "\\ d{ 3 } -\\d{4}.* " will match " 00 0 - 00 0 0 ". Anyone know what the real meaning of this regular expression? Or from another perspective, what's the correct regex for " 00 0 - 00 0 0 "?

Upvotes: 1

Answers (5)

Sergiu Toarca

Reputation: 2749

The regular expression \ d{3} -\d{4}.* matches strings of the form ddd -XXXXY, where each of the X's can be any digit and the Y can be any string.

It is easier to see what this regex does when you have a visual helper to show you what's going on: http://www.debuggex.com/?re=%5C+d%7B3%7D+-%5Cd%7B4%7D.%2A+&str=+ddd+-9662%C2%A3%C2%AA%C2%A3%3B%29+

Upvotes: 1

gnomed

Reputation: 5565

These people are correct that it will match ###-####

But they are forgetting to explain the .* which means essentially "anything else", the . represents any character except the newline.

It does not have much impact on the success of the regex, but it prevents the regex from spanning multiple newlines when it matches. This is usually ideal, depending on whether you expect newlines in your input and what they mean obviously.

EDIT: First of all, the edited regex will not compile in Java with spaces between the {} Also, the spaces are meaningless inside there, it is only looking for quantifying numbers.

So, assuming you remove those spaces from between the {} it would match

" ddd -#### "

Where "d" in this case is quite literally the letter "d" and "#" again is any digit value. Again this is optionally followed by anything because of .*. But now, because there is an extra space after the .* the matching string must also end with at least one space.... That is a pretty useless expression though, are you sure you want to interpret that first d literally?? Perhaps you should check your syntax again.....

Also, fun fact, there is no need to escape the first space, your regex

"\\ d{3} -\\d{4}.* " is syntactically equivalent to " d{3} -\\d{4}.* "

Upvotes: 4

ikegami

Reputation: 385655

The string literal

"\\d{3}-\\d{4}.*"

produces the string

\d{3}-\d{4}.*

When used as a Perl5Matcher regex pattern, it matches strings that

Starts with 3 digits*
Followed by a dash
Followed by 4 digits
Followed by 0 or more characters that aren't newlines**
Followed by the end of the string.

For example,

123-1234: match
123-1234XYZ: match
123-1A34: no match
1234-123: no match
X123-1234: no match

* — In Perl, a digit is any character with the Unicode "Decimal Number" General Category. In Unicode 6.0, there are 420 such characters including 0 to 9. I don't know exactly what characters \d matches when using the Perl5Matcher library. Use [0-9] instead of \d to only match 0 to 9.

** — By default, . matches any character except a newline. Perl5Compiler can be told that . should match any character including a newline.

Upvotes: 7

fge

Reputation: 121712

This is a usage of Jakarta Oro (which has been retired for two years BTW).

The only thing I can see for it is that spaces have been completely messed up, because if you take the regex ^\d{3}-\d{4}.*$, it actually matches what the (space-challenged) comment says it does, ie any string starting with three digits, then a hyphen, then 4 digits.

And note that .matches() is a misnomer (and so are Java's .matches() methods for that matter) since it tries to match the whole input, which is not the definition of regex matching (and which is why I anchored the regex).

Upvotes: 1

Jon Newmuis

Reputation: 26492

It looks like (with spaces removed) it's supposed to match a phone number (sans country code and area code).

\d{3}-\d{4} means <three digits>-<four digits>, or XXX-XXXX (where each X is a digit).

Upvotes: 1

What does this regular expression mean: \\d{3}-\\d{4}.*

Answers (5)

Related Questions