Zydeco
Zydeco

Reputation: 530

Guides for dealing with Unicode in PHP5?

Hey everybody. I'm developing a new site (php5/mySQL) and am looking to finally get on the Unicode bandwagon. I'll admit to knowing next to absolutely nothing about supporting Unicode at the moment, but I'm hoping to resolve that with your help.

After desperately flexing my tiny, pathetic excuses for Googlefu-muscles, and scouring over each page that looked promising to my Unicode-newbie eyes, I have come to the conclusion that, while not entirely supported, my precious language of choice (PHP for those that have forgotten) has made at least a half-assed attempt at managing the foreign beast (and from what else I see, succeeding?). I have also come to the conclusion that

<php header('Content-Type: text/html; charset=utf-8'); ?>

is a great place to start and that I should be looking into supporting UTF-8 since I have plenty of space on my (shared, for the moment) hosting.

However, I'm not sure what this strange functionality known as mb_* means or how to incorporate it into functions such as strlen() and . . . to be honest at this point I don't know what other functionality (that I can't live without) is affected.

So I've come to you SO-ites in search of enlightenment and possibly straightening out my confused (where Unicode is concerned!) brain. I really want to support it but I need serious help.

P.S.: Does Unicode affect mysql_real_escape_string() or any other XSS prevention/security measures? I need to stay on top of this as well!

Thanks ahead of time.

Upvotes: 10

Views: 873

Answers (3)

Imran Omar Bukhsh
Imran Omar Bukhsh

Reputation: 8071

When working with unicode:

  • use <meta content="text/html; charset=utf-8" http-equiv="Content-Type" /> on top of your page when you output
  • right after you connect to your database use the sql query: mysql_query("set names 'utf8'");
  • make sure all tables and required fields have a collation type of: 'utf8_unicode_ci'

Upvotes: -1

Dennis Kreminsky
Dennis Kreminsky

Reputation: 2089

  1. Welcome onboard utf8 :)
  2. You should simply use mb_* functions in place of your traditional str* functions
  3. MySQL and its API has long and well been supporting utf8, the only requirement that you use encoding when saving data and connecting. google for 'SET NAMES utf8'
  4. Note the 'u' modifier for preg_* functions that tells them to use unicode mode.

Upvotes: 5

Related Questions