lovespring
lovespring

Reputation: 19559

strlen, mb_strlen, which to use?

How can i know the character set in $_REQUEST ? and how to set the character set of $_REQUEST ?

Upvotes: 24

Views: 23350

Answers (3)

bucabay
bucabay

Reputation: 5295

Usually you have control of the character encoding since you create the $_REQUEST from the HTML you send to the client.

ie: It is generated by a page you sent from PHP.

Thus you shouldn't have to detect the encoding.

Using the mb_functions requires enabling the multibyte extension - so if you're distributing code, you have to be aware not everyone will have it.

header('Content-Type: text/html; charset=UTF-8');

OR in HTML:

<meta charset="utf-8">

http://www.w3.org/International/O-charset

Edit: PHP6 has utf-8 support, not PHP5.

Upvotes: 1

Stefan Gehrig
Stefan Gehrig

Reputation: 83622

To make it short: you do not really know about the encoding (character set) used on the variables that are passed to your PHP script via GET or POST (especially GET is a problem here). By convention browsers POST forms to the server-side resource specified in the action-attribute using the page encoding which can be specified via an http-equiv-meta-tag (charset-meta-tag in HTML5) or via an HTTP header. Alternatively some browsers also respect the accept-charset-attribute on the form when chosing the correct encoding.

The encoding of GET parameters and the URL itself depends on the browser stettings and can therefore be controlled by the user. You should not rely on a specific encoding.

Generally you'll circumnavigate most encoding-related problems by consistently using UTF-8 for everything and by specifying the correct encoding in the HTTP-header (Content-Type: text/html; charset=UTF-8) - this will yield the correct encoding (UTF-8) in all the variables that are passed into your string (we're not talking about rouge scripts that deliberately try to mess with the encoding to allow for some attack vectors into your script). You also should not rely on non-ascii-characters in your GET parameters or in the URL (that's also a reason why SEO-friendly links remove those characters or substitute them).

If you made sure that UTF-8 is the only allowed character-set you can use mb_strlen($string, 'UTF-8') to check the length of a variable for example.

EDIT: (added some links)

Some things for you to read:

Upvotes: 22

RageZ
RageZ

Reputation: 27313

use mb_internal_encoding to know which encoding is currently set. If you application use a log of different encoding you have better to use mb_strlen.

Cheers

Upvotes: 5

Related Questions