Reputation: 12512
I use the following regex expression to find a phone in a string:
([0-9]{3})\)?[-. ]?([0-9]{3})[-. ]?([0-9]{4})
It works great on numbers like:
555-555-5555 (555)555-5555 (555) 555-5555
However, if there's an extra space inside the string it does not find the phone.
555 -555-5555 (555)555- 5555 (555) 555 -5555
Can it be modified to allow for a space or two? My input comes from OCR and not user input so I can't require formatted input.
Thanks.
Upvotes: 2
Views: 119
Reputation: 89614
To limit the number of added spaces, you can check the position of the first digit of the last group (you can also choose the last digit). Then all you have to do is to describe the different separators the way you want.
~[(\d](?:\b\d{3}\)(?=.{3,5}\W\b) {0,2}\d{3}|\B\d{2}(?=.{4,6}\W\b)(?:- ?| -? ?)\d{3})(?:- ?| -? ?)\d{4}\b~
The same pattern in more readable:
~
[(\d] # first character discrimination technic (avoid the cost of an alternation
# at the start of the pattern)
(?: # with brackets
\b \d{3} \)
(?= .{3,5} \W \b )
\g<spb> \d{3}
| # without brackets
\B \d{2} # you can also replace \B with (?<=\b\d) to check the word-boundary
(?= .{4,6} \W \b )
\g<sp> \d{3}
)
\g<sp> \d{4} \b
# subpattern definitions:
(?<spb> [ ]{0,2} ){0} # separator after bracket
(?<sp> - [ ]? | [ ] -? [ ]? ){0} # other separators
~x
Feel free to change -
to [.-]
or to define your own allowed separators. Don't forget in this case to change also the quantifiers in the lookaheads. Also, if you want to allow the second separator to be empty, check the boundary after the last digit instead of the boundary before first digit of the last group.
Upvotes: 0
Reputation: 48011
I feel like you are asking for a very lenient / inclusive pattern.
This one is pretty forgiving: /\(?\d{3}\)? {0,2}[-.]? {0,2}\d{3} {0,2}[-.]? {0,2}\d{4}/
It will match all of these variants (...and more):
555-555-5555
(555)555-5555
(555) 555-5555
555 -555-5555
(555)555- 5555
(555) 555 -5555
555.555-5555
555.555.5555
5555555555
555-555.5555
(555)5555555
(555).555.5555
(555)-555-5555
(555555-5555
555)-555-5555
555555-5555
555 5555555
555 555 5555
555 - 555 - 5555
555555 . 5555
The pattern logic is in this order:
(
.)
Upvotes: 0
Reputation: 43179
As per your examples your could use
[(\d](?:(?!\h{2,})[-\d()\h])*\d
[(\d] # one of ( or 0-9
(?: # a non-capturing group
(?!\h{2,}) # make sure not 2+ horizontal whitespaces are immediately ahead
[-\d()\h] # then match one of -, 0-9, () or whitespaces
)* # zero or more times
\d # the end must be a digit
It is a variation of the tempered greedy token.
PHP
this could be
<?php
$data = <<<DATA
555-555-5555 (555)555-5555 (555) 555-5555
However, if there\'s an extra space inside the string it does not find the phone. 555 -555-5555 (555)555- 5555 (555) 555 -5555
DATA;
$regex = '~[(\d](?:(?!\h{2,})[-\d()\h])*\d~';
preg_match_all($regex, $data, $matches);
print_r($matches);
?>
Which yields
Array
(
[0] => Array
(
[0] => 555-555-5555
[1] => (555)555-5555
[2] => (555) 555-5555
[3] => 555 -555-5555
[4] => (555)555- 5555
[5] => (555) 555 -5555
)
)
Upvotes: 1
Reputation: 790
If I understood, you want use only regexp, so you can add \s*
in each pattern group like
([0-9]{3})\)?\s*[-. ]?\s*([0-9]{3})\s*[-. ]?\s*([0-9]{4})\s*
This is based on your request script
Upvotes: 0