Reputation: 1261
I'm trying to match either the Eu
or U.s.
using PHP's preg_match_all
.
Given the following sentence:
The Eu is better than the U.s. in certain ways.
I can match both Eu and U.s. if I use:
preg_match_all("/\b(Eu|U\.s\. )\b/", $input_lines, $output_array);
but not if I use:
preg_match_all("/\b(Eu|U\.s\.)\b/", $input_lines, $output_array);
Why do I need a space after the . in order for my regex to work?
Upvotes: 1
Views: 67
Reputation: 1607
What @mmta41 said. Here is a test:
$re = '/(eu|\bU\b.\bs\b.)/mi';
$str = 'U.s.,u.S., U.S. , u.s.. ,Eu,eU, EU , eu.Europe UseuUs Europe';
preg_match_all($re, $str, $matches);
print_r($matches);
see http://sandbox.onlinephpfunctions.com/code/9f435a11609606cf7f8d4f5e330d443989911c5b
Upvotes: 1
Reputation: 372
The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a "word boundary". This match is zero-length.
There are three different positions that qualify as word boundaries:
Before the first character in the string, if the first character is a word character(\w).
After the last character in the string, if the last character is a word character.
so in your case which is number 3 the U.s. bounded like this: \b U \b .\b s \b .
Upvotes: 4