woutr_be
woutr_be

Reputation: 9722

base64_encode() with chinese characters

I'm currently finishing of an ecard system that allows users to send cards to other people by mail.

The mail also contains a link to view the same card in the browser, this link is basically generated by encoding the text with base64_encode()

'http://www.test.com/ecards?card=' . base64_encode('your text'); // like this

This works fine for english text, but once I enter some Chinese and visit the link, the characters are all messed up

汉��N�B��Y][ۘ[�[�\�N�9�(�*���B�[�Z[��0��q��B��[\Y�YY�[�\�N�9cc�+�N�B��Y][ۘ[�[�\�N�:#��*���B��[�\�N�9.+y���

I has nothing to do with my charset, it's set to UTF-8, I even printed the same Chinese text and it's showing up perfectly.

So I'm wondering if base64_encode() and base64_decode() might have something to do with this.

// Doesn't work
echo base64_decode($body);

// Chinese characters show up fine!!!!
echo 'simplified Chinese: 汉语; <br />';
echo 'traditional Chinese: 漢語; <br />';
echo 'Pinyin: Hànyǔ; <br />';
echo 'simplified Chinese: 华语; <br />';
echo 'traditional Chinese: 華語; <br />';
echo 'Chinese: 中文; <br />';

EDIT: When I try outputting $_GET[] when using an url like http://www.test.com/ecards?card=中文, it works fine. So it's clearly the base64_encode or base64_decode that can't handle Chinese characters.

Upvotes: 1

Views: 6344

Answers (1)

Jon
Jon

Reputation: 437574

The base64_ functions do not operate on characters, they operate on bytes. They will happily convert anything you pass in without error.

Your problem here is that the encoding of the characters you are using as input does not match the encoding of the page where they are displayed after decoding. Where does "your text" come from? If it's from a form submission, you need to make sure that the page where the form appears is displayed using the same encoding as the "view card" page, or that the form has an accept-charset attribute matching the encoding of the "view card" page.

Upvotes: 5

Related Questions