SSG
SSG

Reputation: 1515

ElasticSearch : More indices vs More types

We are using elasticsearch for the following usecase.
Elasticsearch Version : 5.1.1
Note: We are using AWS managed ElasticSearch

We have a multi-tenanted system where in each tenant stores data for multiple things and number of tenants will increase day by day.

exa: Each tenant will have following information.

1] tickets
2] sw_inventory
3] hw_inventory

Current indexing stratergy is as follows:

indexname:
tenant_id (GUID) exa: tenant_xx1234xx-5b6x-4982-889a-667a758499c8

types:

1] tickets
2] sw_inventory
3] hw_inventory

Issues we are facing:

1] Conflicts for mappings of common fields exa: (id,name,userId) in types ( tickets,sw_inventory,hw_inventory )
2] As the number of tenants are increasing number of indices can reach upto 1000 or 2000 also.

Will it be a good idea if we reverse the indexing stratergy ?

exa: index names :

1] tickets
2] sw_inventory
3] hw_inventory

types:

tenant_tenant_id1
tenant_tenant_id2
tenant_tenant_id3
tenant_tenant_id4

So there will be only 3 huge indices with N number of types as tenants.

So the question in this case is which solution is better?

1] Many small indices and 3 types
OR
2] 3 huge indices and many types

Regards

Upvotes: 10

Views: 3093

Answers (4)

Luka Lopusina
Luka Lopusina

Reputation: 2647

Indices created in Elasticsearch 6.0.0 or later may only contain a single mapping type which means that doc_type (_type) is deprecated.

Full explanation you can find here but in summary there are two solutions:

Index per document type

This approach has two benefits:

  • Data is more likely to be dense and so benefit from compression techniques used in Lucene.
  • The term statistics used for scoring in full text search are more likely to be accurate because all documents in the same index represent a single entity.

Custom type field

Of course, there is a limit to how many primary shards can exist in a cluster so you may not want to waste an entire shard for a collection of only a few thousand documents. In this case, you can implement your own custom type field which will work in a similar way to the old _type.

PUT twitter
{
  "mappings": {
    "_doc": {
      "properties": {
        "type": { "type": "keyword" }, 
        "name": { "type": "text" },
        "user_name": { "type": "keyword" },
        "email": { "type": "keyword" },
        "content": { "type": "text" },
        "tweeted_at": { "type": "date" }
      }
    }
  }
}

You use older version of Elastic but the same logic can apply and it would be easer for you to move to newer version when you decide to do that so I think that you should go with separate index structure or in other words 3 huge indices and many types but types as a field in mapping not as _type.

Upvotes: 0

Andrei Stefan
Andrei Stefan

Reputation: 52366

I suggest a different approach: https://www.elastic.co/guide/en/elasticsearch/guide/master/faking-it.html

Meaning custom routing where each document has a tenant_id or similar (something that is unique to each tenant) and use that both for routing and for defining an alias for each tenant. Then, when querying documents only for a specific tenant, you use the alias.

You are going to use one index and one type this way. Depending on the size of the index, you consider the existing index size and number of nodes and try to come up with a number of shards in such way that they are split evenly more or less on all data holding nodes and, also, following your tests the performance is acceptable. IF, in the future, the index grows too large and shards become too large to keep the same performance, consider creating a new index with more primary shards and reindex everything in that new one. It's not an approach unheard of or not used or not recommended.

1000-2000 aliases is nothing in terms of capability of being handled. If you have close to 10 nodes, or more than 10, I also do recommend dedicated master nodes with something like 4-6GB heap size and at least 4CPU cores.

Upvotes: 6

Fernando Fernandez
Fernando Fernandez

Reputation: 41

I think both strategies have pros and cons:

Multiple Indexes:

Pros: - Tenant data is isolated from the others and no query would return results from more than one. - If total of documents is a very big number, different smaller indexes could give a better performance

Cons: Harder to manage. If each index has few documents you may be wasting a lot of resources.

EDITED: Avoid multiple types in the same index as per comments o performance and deprecation of the feature

Upvotes: -1

ryanlutgen
ryanlutgen

Reputation: 3051

Neither approach would work. As others have mentioned, both approaches cost performance and would prevent you from upgrading.

Consider having one index and type for each set of data, e.g. sw_inventory and then having a field within the mapping that differentiates between each tenant. You can then utilize document level security in a security plugin like X-Pack or Search Guard to prevent one tenant from seeing another's records (if required).

Upvotes: 4

Related Questions