santa
santa

Reputation: 12512

Regex too rigid

I use the following regex expression to find a phone in a string:

([0-9]{3})\)?[-. ]?([0-9]{3})[-. ]?([0-9]{4})

It works great on numbers like:

555-555-5555 (555)555-5555 (555) 555-5555

However, if there's an extra space inside the string it does not find the phone. 555 -555-5555 (555)555- 5555 (555) 555 -5555

Can it be modified to allow for a space or two? My input comes from OCR and not user input so I can't require formatted input.

Thanks.

Upvotes: 2

Views: 119

Answers (4)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89614

To limit the number of added spaces, you can check the position of the first digit of the last group (you can also choose the last digit). Then all you have to do is to describe the different separators the way you want.

~[(\d](?:\b\d{3}\)(?=.{3,5}\W\b) {0,2}\d{3}|\B\d{2}(?=.{4,6}\W\b)(?:- ?| -? ?)\d{3})(?:- ?| -? ?)\d{4}\b~

demo

The same pattern in more readable:

~
[(\d]  # first character discrimination technic (avoid the cost of an alternation
       # at the start of the pattern)
(?: # with brackets
    \b \d{3} \)
    (?= .{3,5} \W \b )
    \g<spb> \d{3}
  | # without brackets
    \B \d{2} # you can also replace \B with (?<=\b\d) to check the word-boundary
    (?= .{4,6} \W \b )
    \g<sp> \d{3}
)
\g<sp> \d{4} \b

# subpattern definitions:
(?<spb> [ ]{0,2} ){0}             # separator after bracket
(?<sp> - [ ]? | [ ] -? [ ]? ){0}  # other separators
~x

demo

Feel free to change - to [.-] or to define your own allowed separators. Don't forget in this case to change also the quantifiers in the lookaheads. Also, if you want to allow the second separator to be empty, check the boundary after the last digit instead of the boundary before first digit of the last group.

Upvotes: 0

mickmackusa
mickmackusa

Reputation: 48011

I feel like you are asking for a very lenient / inclusive pattern.

This one is pretty forgiving: /\(?\d{3}\)? {0,2}[-.]? {0,2}\d{3} {0,2}[-.]? {0,2}\d{4}/

Pattern Demo Link

It will match all of these variants (...and more):

555-555-5555
(555)555-5555
(555) 555-5555
555 -555-5555
(555)555- 5555
(555) 555 -5555
555.555-5555
555.555.5555
5555555555
555-555.5555
(555)5555555
(555).555.5555
(555)-555-5555
(555555-5555
555)-555-5555
555555-5555
555 5555555
555 555 5555
555 - 555 - 5555
555555  .  5555

The pattern logic is in this order:

  • permit an optional (.
  • require 3 digits
  • permit an optional )
  • permit zero, one, or two literal spaces
  • permit an optional hyphen or dot
  • permit zero, one, or two literal spaces
  • require 3 digits
  • permit zero, one, or two literal spaces
  • permit an optional hyphen or dot
  • permit zero, one, or two literal spaces
  • require 4 digits

Upvotes: 0

Jan
Jan

Reputation: 43179

As per your examples your could use

[(\d](?:(?!\h{2,})[-\d()\h])*\d

See a demo on regex101.com.


That is

[(\d]          # one of ( or 0-9
(?:            # a non-capturing group
    (?!\h{2,}) # make sure not 2+ horizontal whitespaces are immediately ahead
    [-\d()\h]  # then match one of -, 0-9, () or whitespaces
)*             # zero or more times
\d             # the end must be a digit

It is a variation of the tempered greedy token.


In PHP this could be

<?php
$data = <<<DATA
555-555-5555   (555)555-5555    (555) 555-5555

However, if there\'s an extra space inside the string it does not find the phone. 555 -555-5555   (555)555- 5555    (555) 555 -5555
DATA;

$regex = '~[(\d](?:(?!\h{2,})[-\d()\h])*\d~';

preg_match_all($regex, $data, $matches);
print_r($matches);
?>

Which yields

Array
(
    [0] => Array
        (
            [0] => 555-555-5555
            [1] => (555)555-5555
            [2] => (555) 555-5555
            [3] => 555 -555-5555
            [4] => (555)555- 5555
            [5] => (555) 555 -5555
        )

)

Upvotes: 1

Oscar Zarrus
Oscar Zarrus

Reputation: 790

If I understood, you want use only regexp, so you can add \s* in each pattern group like

([0-9]{3})\)?\s*[-. ]?\s*([0-9]{3})\s*[-. ]?\s*([0-9]{4})\s*

This is based on your request script

Here a DEMO EXAMPLE

Upvotes: 0

Related Questions