Reputation: 15464
I will be indexing posts in ElasticSearch. For now there are two languages: English and Chinese. So Each post has one (English) or two translations plus some data that are common for both languages. My question is how should I index posts?
posts-en
and posts-cn
and store posts separately?Create single index posts
and keep data in format like this:
{
commonParam1: 1,
commonParam2: "somevalue",
...
titleEn: "English title",
titleCn: "Chinese title",
contentEn: "Content EN",
contentCn: "Content CN",
...
}
Upvotes: 2
Views: 341
Reputation: 27517
Unless you have a compelling reason to split a single document across two indexes I'd strongly advise keeping it all in one index.
With one index you can easily use a different analyzer for each each language specific field. Adding additional mappings in the future for new languages is fairly straightforward. It allows you to index each document in a single call as opposed to two, one for each language, if you index separately. You reduce duplicated data (e.g. the common data).
I'd also take a good look at this post: http://gibrown.wordpress.com/2013/05/01/three-principles-for-multilingal-indexing-in-elasticsearch/
It's a good discussion on analyzing and indexing for multiple languages into Elasticsearch.
Upvotes: 1