user606521
user606521

Reputation: 15464

Multilanguage elastic search

I will be indexing posts in ElasticSearch. For now there are two languages: English and Chinese. So Each post has one (English) or two translations plus some data that are common for both languages. My question is how should I index posts?

  1. Create two indices: posts-en and posts-cn and store posts separately?
  2. Create single index posts and keep data in format like this:

    {
      commonParam1: 1,
      commonParam2: "somevalue",
      ...
      titleEn: "English title",
      titleCn: "Chinese title",
      contentEn: "Content EN",
      contentCn: "Content CN",
      ...
    }
    

Upvotes: 2

Views: 341

Answers (1)

John Petrone
John Petrone

Reputation: 27517

Unless you have a compelling reason to split a single document across two indexes I'd strongly advise keeping it all in one index.

With one index you can easily use a different analyzer for each each language specific field. Adding additional mappings in the future for new languages is fairly straightforward. It allows you to index each document in a single call as opposed to two, one for each language, if you index separately. You reduce duplicated data (e.g. the common data).

I'd also take a good look at this post: http://gibrown.wordpress.com/2013/05/01/three-principles-for-multilingal-indexing-in-elasticsearch/

It's a good discussion on analyzing and indexing for multiple languages into Elasticsearch.

Upvotes: 1

Related Questions