kishan maharana
kishan maharana

Reputation: 557

how to handle large data searching in nested field in elasticsearch

I have been stucked in a scenario and not getting any proper solution. Here is the problem i am facing with Elasticsearch. Any help would be appriciated.

  1. I have two indexes one is video and another is subtitles. A video can have multiple chunks of subtitles which i am storing in subtitle index with video_id.
  2. Now when i search anything i want to search video title, description and its subtitle as well.
  3. But as far as i know ES does not supports Relational mapping. So i was not able to search subtitles as i need.
  4. Then i tried to store all the subtitles of a single video inside video index in form of nested array. But in long term a long video might have a lots of subtitles which will make my document more heavy and put performance impact.
  5. Please do consider that a single video's subtitle can be upto 1GB

So need your help in this to find a solution. Thank you.

Upvotes: 0

Views: 92

Answers (1)

Sam
Sam

Reputation: 263

Since ES is nonSQL, you won't get the relational searching feature. You have a few ways you can resolve this:

  1. The one you mentioned, that you store everything (video + its subtitles) in same document but that can increase document size. By default, ES has 100 MB document size but it can be increased to 2 GB. But after a while size of your data (video + subtitle) might even exceed that.
  2. You store the data separately only in two indices video and subtitle. In the subtitle document you store a field video_id. Now while searching you will have to do two queries, one to get video data es.get(index="Video", id=video_id, _source_includes=["video_ description", "video_description"]) and one to get the subtitles es.search(index="Subtitles", body={"query": {"match": {"video_id.keyword": video_id}}})["hits"]["hits"]. This was you will get video title, description and its subtitle. FYI, _source_includes will only return those fields, this will give better performance in case document is large.
  3. There's a concept of Parent/Child in ES, where Parent (Video) is stored in a different place than Child (Subtitle) but are routed to the same shard. You can achieve this by custom mapping. It reduces performance because ES has to store a join list. For this you can take a look at this blog: https://www.elastic.co/blog/managing-relations-inside-elasticsearch

PS: I am not sure what you are working on for which you required this but I would personally prefer the second way since trying to make the tables relational will end up costing performance.

Upvotes: 1

Related Questions