Reputation: 165
My application is a survey creation app where user can create survey with many questions of different types. Each survey can then be shared to any number of people whose responses are recorded as below...
{
"id" : 256, // submission id
"timeTaken" : "39.00",
"startTime" : "2020-07-19T05:37:38.873Z",
"state" : "COMPLETED",
"completedTime" : "2020-07-19T05:38:17.873Z",
"deviceType" : "COMPUTER",
"ip" : null,
"account_id" : 2,
"channel_id" : 48,
"contact_id" : null,
"survey_id" : 10,
"trigger_id" : 93,
"trigger_contact_id" : null,
"locked" : false,
"location" : null,
"language" : null,
"submission_id" : 256,
"question_90" : {
"skipped" : false,
"answer_choices" : [ 79 ]
},
"question_122" : {
"skipped" : false,
"otherChoice" : null,
"answer_choices" : [ 115, 113, 111, 110, 114 ]
},
"question_106" : {
"skipped" : false,
"answer_choices" : [
85
]
},
"question_120" : {
"answer_txt": "Great service",
"skipped" : false
},
"question_118" : {
"answer_txt": "Hello people",
"skipped" : false
},
"question_121" : {
"skipped" : false,
"answer_date" : "2020-06-04T20:01:49.783Z",
"answer_timezone" : 330
},
"question_108" : {
"skipped" : false,
"answer_int" : "93"
},
"question_105" : {
"skipped" : false,
"answer_string" : "+1 202 9932219"
},
"question_93" : {
"skipped" : false,
"answer_string" : "[email protected]"
},
"question_117" : {
"skipped" : false
},
"question_92" : {
"skipped" : false,
"answer_txt" : "composite"
},
"question_107" : {
"skipped" : false,
"answer_bool" : true
},
}
Initially i had created one index per survey but it turned out to be a bad idea since each index allocated 5 shards and my application had nearly 20k surveys created by users. Amazon elastic service broke down and responded 60k shards were created in my 2 nodes..
In this dilemma, I have no idea on how to create my index or meaningfully partition it for efficient querying in the later stage.
Can anyone share some insights and ask me more question so that I can update question for clarity?
Upvotes: 0
Views: 121
Reputation: 32386
Looks like you are using elasticsearch version < 7.X where default number of primary shards were 5 which is changed to 1 and one of the reason was your problem of having a lot of smaller size shards which impacts the Elasticsearch performance.
You should ideally create just one index for all your survey and based on time-range or size you can roll-over to a new index.
you need to have survey_id
(unique identification of survey) in your single index and when querying against the index, use survey_id
in filter context to get the better query performance as filter contexts are cached by default.
Upvotes: 1