Reputation: 4136
I need a regex to extract a code_number, the requirements are:
Ideally this should be done using only one regex.
With the following regex I'm almost there, the problem is that this regex does not comply whit the third requirement, it should not match the 11111
because the lack of at least one letter
$regex = '~\b(?=[a-zA-Z0-9]{5}\b)[a-zA-Z0-9]*\d[a-zA-Z0-9]*~';
$sms = ' 11111 keyrod 07:30 02.10.2013';
preg_match($regex, $sms, $matches);
print_r($matches); // print array([0] => 11111)
How could I change this regex to not match a string of only number?
Upvotes: 2
Views: 5281
Reputation: 89639
try this:
$subject = ' :::5: abcde4 abcd4 12345 abcde :a:1:';
$regex = '~(?<= |^)(?=\S{0,4}\d)(?=\S{0,4}[a-z])\S{5}(?= |$)~i';
preg_match_all($regex, $subject, $matches);
print_r($matches);
explanation:
(?<=)
and (?=)
are respectively a lookbehind and a lookahead assertions. They test a condition before or after and don't eat any characters. (They are zero width)
In this case:
(?<= |^) --> a space or the beginning of the string before
(?= |$) --> a space or the end of the string after
The character class:
\S --> all characters that are not white (space, tab, newline..)
The conditions:
At least one digit is forced by the lookahead:
(?=\S{0,4}\d)
there's between 0 and 4 non-blank characters and a digit. In other words you can have:
1
x1
xx1
xxx1
xxxx1
it's the same for the letters with (?=\S{0,4}[a-z])
The number of characters for the string is forced with \S{5}
and the first and final lookaround that forbid all non-white characters before and after.
Upvotes: 1
Reputation: 15311
Based on the rules you describe, nothing in your $sms
string will match. But based on those rules, try this:
preg_match('~\b(?=[a-z0-9]{0,4}[a-z])(?=[a-z0-9]{0,4}[0-9])[a-z0-9]{5}\b~i', $subject, $matches);
Using your example string and Casimir's example string: http://codepad.viper-7.com/NA2mI5
Output:
//Your example string:
Array
(
)
//Other sample string:
Array
(
[0] => abcd4
)
Upvotes: 2