Reputation: 1515
We are using elasticsearch for the following usecase.
Elasticsearch Version : 5.1.1
Note: We are using AWS managed ElasticSearch
We have a multi-tenanted system where in each tenant stores data for multiple things and number of tenants will increase day by day.
exa: Each tenant will have following information.
1] tickets
2] sw_inventory
3] hw_inventory
Current indexing stratergy is as follows:
indexname:
tenant_id (GUID) exa: tenant_xx1234xx-5b6x-4982-889a-667a758499c8
types:
1] tickets
2] sw_inventory
3] hw_inventory
Issues we are facing:
1] Conflicts for mappings of common fields exa: (id,name,userId) in types ( tickets,sw_inventory,hw_inventory )
2] As the number of tenants are increasing number of indices can reach upto 1000 or 2000 also.
Will it be a good idea if we reverse the indexing stratergy ?
exa: index names :
1] tickets
2] sw_inventory
3] hw_inventory
types:
tenant_tenant_id1
tenant_tenant_id2
tenant_tenant_id3
tenant_tenant_id4
So there will be only 3 huge indices with N number of types as tenants.
So the question in this case is which solution is better?
1] Many small indices and 3 types
OR
2] 3 huge indices and many types
Regards
Upvotes: 10
Views: 3093
Reputation: 2647
Indices created in Elasticsearch 6.0.0 or later may only contain a single mapping type which means that doc_type (_type) is deprecated.
Full explanation you can find here but in summary there are two solutions:
Index per document type
This approach has two benefits:
Custom type field
Of course, there is a limit to how many primary shards can exist in a cluster so you may not want to waste an entire shard for a collection of only a few thousand documents. In this case, you can implement your own custom type field which will work in a similar way to the old _type.
PUT twitter
{
"mappings": {
"_doc": {
"properties": {
"type": { "type": "keyword" },
"name": { "type": "text" },
"user_name": { "type": "keyword" },
"email": { "type": "keyword" },
"content": { "type": "text" },
"tweeted_at": { "type": "date" }
}
}
}
}
You use older version of Elastic but the same logic can apply and it would be easer for you to move to newer version when you decide to do that so I think that you should go with separate index structure or in other words 3 huge indices and many types but types as a field in mapping not as _type.
Upvotes: 0
Reputation: 52366
I suggest a different approach: https://www.elastic.co/guide/en/elasticsearch/guide/master/faking-it.html
Meaning custom routing where each document has a tenant_id
or similar (something that is unique to each tenant) and use that both for routing and for defining an alias for each tenant. Then, when querying documents only for a specific tenant, you use the alias.
You are going to use one index and one type this way. Depending on the size of the index, you consider the existing index size and number of nodes and try to come up with a number of shards in such way that they are split evenly more or less on all data holding nodes and, also, following your tests the performance is acceptable. IF, in the future, the index grows too large and shards become too large to keep the same performance, consider creating a new index with more primary shards and reindex everything in that new one. It's not an approach unheard of or not used or not recommended.
1000-2000 aliases is nothing in terms of capability of being handled. If you have close to 10 nodes, or more than 10, I also do recommend dedicated master nodes with something like 4-6GB heap size and at least 4CPU cores.
Upvotes: 6
Reputation: 41
I think both strategies have pros and cons:
Multiple Indexes:
Pros: - Tenant data is isolated from the others and no query would return results from more than one. - If total of documents is a very big number, different smaller indexes could give a better performance
Cons: Harder to manage. If each index has few documents you may be wasting a lot of resources.
EDITED: Avoid multiple types in the same index as per comments o performance and deprecation of the feature
Upvotes: -1
Reputation: 3051
Neither approach would work. As others have mentioned, both approaches cost performance and would prevent you from upgrading.
Consider having one index and type for each set of data, e.g. sw_inventory
and then having a field within the mapping that differentiates between each tenant. You can then utilize document level security in a security plugin like X-Pack or Search Guard to prevent one tenant from seeing another's records (if required).
Upvotes: 4