François
François

Reputation: 1852

How to validate a user-supplied charset string for mb_*() functions?

A variable ($my_charset below) holds a user-supplied charset / encoding string, expected like UTF-8, ISO-8859-1, or Windows-1251. How can I programmatically validate it? I have user-supplied text ($my_text below) at disposal too, supposedly in this encoding.

My solution so far:

$is_valid = @mb_check_encoding($my_text, $my_charset);

I don't like it, because:

PHP provides mb_list_encodings() and mb_encoding_aliases() functions that I could use to build a list all supported encodings, and check in a case-insensitive manner if the user-supplied encoding is included in this list. I don't like this solution either, overkill. Needs to call mb_encoding_aliases() for each item returned by mb_list_encodings() (over 50).

Do you have a better solution?

Upvotes: 2

Views: 238

Answers (1)

user3942918
user3942918

Reputation: 26413

You can validate the supplied charset on its own with:

$is_valid = @mb_check_encoding('', $my_charset);

The error control operator may be icky, but there's nothing wrong with using it here. It exists for a reason beyond evil. And you don't have any worry about conflating an unsupported encoding with incorrectly encoded text.


If you still want to avoid that, using mb_list_encodings and mb_encoding_aliases isn't overkill - ~50 encodings with ~4 aliases each is not a lot. Though if you don't want to run those on each request you can use them to generate a static array and load that instead.

Example:

$encodings = mb_list_encodings();
foreach ($encodings as $enc) {
    $encodings = array_merge($encodings, mb_encoding_aliases($enc));
}
$encodings = array_change_key_case(array_fill_keys($encodings, true));
var_export($encodings);

This'll dump out valid PHP that you can paste directly into a php file. You could instead serialize it with serialize or json_encode and unserialize it later, whatever you're into.

It uses the encodings as keys instead of values so your lookup will happen in O(1) time rather than O(n). The array_change_key_case is there to lowercase them all for easy lookup with:

$is_valid = isset($encodings[strtolower($my_charset)]);

Upvotes: 2

Related Questions