Kroptokin
Kroptokin

Reputation: 1

Should I care about multi-byte character strings when using preg_match in PHP

I can't seem to find a straight answer to this question.

If my pattern does not contain characters outside the ascii range do I need the /u modifier? The documentation seems to suggest not. If the string being matched is UTF-8 I will still match the characters I want no?

Thanks

Upvotes: 0

Views: 146

Answers (3)

nobody
nobody

Reputation: 10645

Take /^.$/ that matches a single character string for example.

var_dump( preg_match( '/^.$/u','族' ) );
var_dump( preg_match( '/^.$/','族' ) );

result:

int(1)
int(0)

So yes /u does make a difference even when you don't have characters outside ascii table in your pattern.

Upvotes: 0

Marc B
Marc B

Reputation: 360682

It's not whether the pattern contains utf characters, but whether the string you're matching against does. You may not be looking for non-ascii chars, but if there's any multibyte characters in the string, your pattern MAY match one of the "extra" bytes in a UTF character.

Upvotes: 1

Justin Morgan
Justin Morgan

Reputation: 30760

I can't test out your second question because I don't have a PHP environment in front of me, but the answer to the first question is no. If you're only dealing with ASCII characters, there's no need for /u.

Upvotes: 0

Related Questions