firebird
firebird

Reputation: 3511

Building software for international use?

I'm trying to see if our application, which was built for US users, can be expanded to support international users. Our application provides online training and course catalog and as far as international users go, we would need to collect their name and address.

If we update our oracle database to use NVARCHAR columns (which support UTF-16) and we use UTF-8 encoding type for our web pages, if someone enters an address in Chinese...how would we "convert" it to english (we only have english speakers on our staff)? Do we have to use google translate to convert it?

Also most of our webpage input fields have regex validators that only allow A-Z and 0-9 characters...I'm guessing those will need to be removed to support international users?

Upvotes: 0

Views: 52

Answers (2)

Paweł Dyda
Paweł Dyda

Reputation: 18662

If you want to support Chinese language, there converting database text fields to NVARCHAR (I am sensing MS SQL here) is not enough. If you are using the MS SQL 2012 or above (it is necessary to correctly support Supplementary Characters (required not only by Chinese language but few other languages as well). You basically need to set up the collation ending with _SC (which as you may guess stands for "Supplementary Characters"). Only then your database will be using UTF-16 (it will use UCS-2 otherwise).

That's one thing. I don't think I get what you said, but if you want to have international characters in URI (basically making it an IRI), these characters would be:

However, if you're mean domain name (wise move to have Chinese domain name), different process will take place (only to domain parts!). They will be converted to so-called Punycode.
BTW. Having Chinese domain name, not necessary mean using Chinese characters. Because of problems with entering Chinese characters from the keyboard (Input Method Editors, anyone?), it is a bit better to have domain name like 888.cn.

OK, re-reading your question again, it seems that you're asking about postal address. Well, you should not, under any circumstances convert characters to English. What you need this for? If you want to sent an invoice (especially by snail mail), you actually need correct Chinese address. You should not alter it.
If you need it to verify the credit card and your service provider accepts ASCII only, you should really think how to change service provider. No excuses, sorry. If you can't (for any reason), ask them how to prepare the data. They have to be held accountable for their bugs.

As for validators... Well, since it's a .Net, I can't help much. You'll need to re-write regular expressions to handle international characters correctly.
BTW. If you're using something along the lines of [A-Za-z+] [A-Za-z+] for validating personal names, then Mr. O'Reilly would not be able to use your service. Similarly, address like Roue de L'année 11 will also be rejected.

Upvotes: 1

Johan
Johan

Reputation: 389

There's no way you can convert a chineese name or adress to english, is there? That wouldn't make any sense. A lot of chineese people also write their name with latin letters and I suspect this is what most people will do when registering on your site.

You could also read up on encodings as utf-8 and utf-16 are quite different and encoding is a real jungle. I don't think that utf-16 covers all code points in the utf character set which utf-8 does. See for example: http://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings

What you should do will depend a lot on what you are going to do with the information that you save. I would suggest that it might be a good idea to start looking at what limitations your business processes will impose. For example, if you are using the address to mail stuff, I guess the us postal services will be a bit confused if the adress is written with chineese letters.

Upvotes: 1

Related Questions