Reputation: 33431
I am building a php web app that requires internationalization. I have decided to use get-text for system related strings and perhaps some database tables for user generated content.
For example, a user might be able to post a blog post. He should be able to post different versions of that post in different languages. I can implement this by storing all the posts in the posts table with an extra column denoting the language.
The difficult bit is trying to internationalize system strings stored in the database.
For example, I have a table that stores privileges. Each privilege should have a string that describes what this privilege does.
At the moment, it is stored in a table like this:
I plan to use an application like PoEdit to generate gettext files. It is able to search through all the php files to grab the strings. But in the cases like this where the string is stored in the database, it can be a fair bit of work to extract the string for transation. What are some tricks and solutions to handling this?
Finally, lets say I have some data types and forms that users can create and define in the app. For example, defining a "product type" for a shopping cart. This means that the product will have its own unique set of attributes and descriptions. These attributes will require translating along with the description.
The same case with forms. The user can create a form which might be stored in a set of tables. These forms then need to be translated.
What are some database models I can use to store translations for forms and product types?
Cheers :)
Upvotes: 5
Views: 774
Reputation: 11591
For strings that are more "system"-oriented, like :
Gettext is good.
Privileges, for instance, are more "system"-oriented (users don't create privileges, rather admins grant them to users but don't create new types of privileges). So, your privileges table can have a column "privilege_name" which is never displayed and contains only gettext keys, for instance : "Privilege : User can edit posts in the specified forum".
Strings in your application should also not be the text that the user will see, but something more detailed like "Menu: Edit Preferences".
Those strings go through gettext (even for english or the site's "mother tongue") and get translated to the proper user-visible strings.
You should also use numbered sprintf-style arguments, ie not "Price of %s is %s" but "Price of %(1)s is %(2)s".
This has several advantages :
If you have text in images (like some buttons) you need to take care of this too. Gettext can translate filenames, too (images/buttons/en/submit.png => images/buttons/fr/valider.png although a simple regexp would be nice too) and don't forget the for blind people using screen readers.
For multilingual user-generated content (stored in tables), the usual relational approach is better.
Table posts( post_id ... ) Table posts_translated( post_id foreign key, language_id foreign key, title, text, etc )
This will allow you to use SQL for everything like showing the list of languages available for a post, allowing default languages, showing untranslated posts, fulltext search, etc.
Upvotes: 2
Reputation: 5935
For a simpler solution, but without updating structures of your tables you can store serialized arrays with translations in each translated field. For example:
id title
1 a:2:{s:7:"english";s:12:"Hello World!";s:7:"spanish";s:14:"¡Hola, mundo!";}
Then, when the record is loaded by id, you simply deserializing contents by calling unserialize
and select required translation. You can also use json_encode
and json_decode
for serialization and deserialization. It may be done in your model or base model class.
But when you have to support a lot of languages, it can be a problem since the size of all fields should be multiplied by a number of languages.
EDIT: But, as peufeu kindly pointed out, this solution breaks fulltext search and does not allow you to check existing translations with simple SQL query. So if you use fulltext search or need to select content by 'translated' flag, don't use it.
Upvotes: -1
Reputation: 29649
Tricky question.
In my experience, the translation process is really hard to manage - and you don't want to hold up the release of your app because the Uzbek translation for the "Create posts" privilege hasn't been approved yet.
I've never worked with gettext, but the .Net and Java equivalents require you to put a file on the web server with translations; this is often treated as a deployment, and at the very least must go through the version control routines - this can also be a bit of a pain...
If you can get away with it, I'd agree on a convention of TableName_ColumnName for your gettext keys, and store all the user-visible system messages in gettext files.
I would argue that localization of system messages is a front-end concern, and shouldn't be in the database. Where your business entities - posts, shopping items etc. - can be localized, that's a domain issue and should be reflected in your database schema.
Upvotes: 1
Reputation: 5935
I think gettext is more appropriate to use for translation of messages, buttons, captions and so on. And not dynamic content. For such a purpose you can go several different ways depending on project requirements. First, you should decide:
For a solution with maximum flexibility and extensibility you can go folowing way. Original tables are staying as is, with no changes. For translations you add a table translations
with columns:
id
primary keyobject_table
name of translated tableobject_id
reference to translated objectlanguage
or language_id
if you need to manage languages dynamicallyfield
name of translated fieldoriginal_md5
hash of original field value (if you need to track changes) translation
author_id
, published
, date
and other fields required for moderation, if you need it.Then, in another table or configuration file you describe what tables and fields are to be translated. You can also describe types of fields, e.g. text
, textarea
, html
, file
and so on.
For example:
$translatedFields = array(
// here key stands for translated table name
'posts' => array(
// here key stands for translated field
// and value for field's type
'title' => 'text',
'body' => 'html',
),
);
Then, in your data access layer you determine current language and substitute all SELECT queries to tables which are to be translated. A bit of regexp magic and strict syntax of your queries can help you here. The query SELECT title, body FROM posts WHERE posts.id='1'
turns into
SELECT
IF(posts_title_translation.translation IS NULL,
posts.title,
posts_title_translation.translation) AS title,
IF(posts_body_translation.translation IS NULL,
posts.body,
posts_body_translation.translation) AS body
FROM posts
LEFT JOIN translations AS posts_title_translation
ON posts_title_translation.object_id = posts.id
AND posts_title_translation.object_table = 'posts'
AND posts_title_translation.language = '$language'
AND posts_title_translation.field = 'title'
---- And if you need premoderation, then filter off unpublished translations
--AND posts_title_translation.published
LEFT JOIN translations AS posts_body_translation
ON posts_body_translation.object_id = posts.id
AND posts_body_translation.object_table = 'posts'
AND posts_body_translation.language = '$language'
AND posts_body_translation.field = 'body'
--AND posts_body_translation.published
WHERE posts.id = '1'
(The IF expression in SELECT section allows you to select original field when there is no translation of it done or published).
This is how you can have a flexible i18n system. Translations are made for each separate field and automatically substituted in data access layer while selecting.
It's a bit tricky, and, must say, partly influensed by Joom!Fish extension for Joomla!, but that's how I'd do this.
I'll add another answer for a simpler solution, because this is already too large.
Upvotes: -1