Eric
Eric

Reputation: 1261

Why do I need a space in this regex?

I'm trying to match either the Eu or U.s. using PHP's preg_match_all.
Given the following sentence:

The Eu is better than the U.s. in certain ways.

I can match both Eu and U.s. if I use:

preg_match_all("/\b(Eu|U\.s\. )\b/", $input_lines, $output_array);

but not if I use:

preg_match_all("/\b(Eu|U\.s\.)\b/", $input_lines, $output_array);

Why do I need a space after the . in order for my regex to work?

Upvotes: 1

Views: 67

Answers (2)

cottton
cottton

Reputation: 1607

What @mmta41 said. Here is a test:

$re = '/(eu|\bU\b.\bs\b.)/mi';
$str = 'U.s.,u.S., U.S. , u.s.. ,Eu,eU, EU , eu.Europe UseuUs Europe';

preg_match_all($re, $str, $matches);

print_r($matches);

see http://sandbox.onlinephpfunctions.com/code/9f435a11609606cf7f8d4f5e330d443989911c5b

Upvotes: 1

mmta41
mmta41

Reputation: 372

The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a "word boundary". This match is zero-length.

There are three different positions that qualify as word boundaries:

  1. Before the first character in the string, if the first character is a word character(\w).

  2. After the last character in the string, if the last character is a word character.

  3. Between two characters in the string, where one is a word character and the other is not a word character.

so in your case which is number 3 the U.s. bounded like this: \b U \b .\b s \b .

Upvotes: 4

Related Questions