Reputation: 1852
A variable ($my_charset
below) holds a user-supplied charset / encoding string, expected like UTF-8
, ISO-8859-1
, or Windows-1251
. How can I programmatically validate it? I have user-supplied text ($my_text
below) at disposal too, supposedly in this encoding.
My solution so far:
$is_valid = @mb_check_encoding($my_text, $my_charset);
I don't like it, because:
@
to suppress errors (like Warning: mb_check_encoding(): Invalid encoding "some-invalid-encoding") which is bad programming practice.false
returned value doesn't help me to distinguish a bad charset string from a valid charset string and incorrectly encoded text.PHP provides mb_list_encodings()
and mb_encoding_aliases()
functions that I could use to build a list all supported encodings, and check in a case-insensitive manner if the user-supplied encoding is included in this list. I don't like this solution either, overkill. Needs to call mb_encoding_aliases()
for each item returned by mb_list_encodings()
(over 50).
Do you have a better solution?
Upvotes: 2
Views: 238
Reputation: 26413
You can validate the supplied charset on its own with:
$is_valid = @mb_check_encoding('', $my_charset);
The error control operator may be icky, but there's nothing wrong with using it here. It exists for a reason beyond evil. And you don't have any worry about conflating an unsupported encoding with incorrectly encoded text.
If you still want to avoid that, using mb_list_encodings
and mb_encoding_aliases
isn't overkill - ~50 encodings with ~4 aliases each is not a lot. Though if you don't want to run those on each request you can use them to generate a static array and load that instead.
$encodings = mb_list_encodings();
foreach ($encodings as $enc) {
$encodings = array_merge($encodings, mb_encoding_aliases($enc));
}
$encodings = array_change_key_case(array_fill_keys($encodings, true));
var_export($encodings);
This'll dump out valid PHP that you can paste directly into a php file. You could instead serialize it with serialize
or json_encode
and unserialize it later, whatever you're into.
It uses the encodings as keys instead of values so your lookup will happen in O(1) time rather than O(n). The array_change_key_case
is there to lowercase them all for easy lookup with:
$is_valid = isset($encodings[strtolower($my_charset)]);
Upvotes: 2