Reputation: 14605
I am building a search functionality for two types of related documents, let's call them "blogs" and "posts", respectively a blog website (with a bunch of posts) and the specific posts written in that blog. I'd like to be able to search against both of them. In a relational database (which ES is not), I would have two main tables which would be linked against a foreign key, and I could search the two tables separately or with a join. In Elasticsearch, I am considering a parent-child relationship where "blog" is the parent document, and there are potentially many "post" documents associated with it as the child.
EDIT: I should explain why I want to index them this way. Basically, I want people to be able to search for blogs (the overall series of posts written by the same author), and the search terms might not be in the blog's description alone, but rather in the posts; for instance, a blog about Python might have a general description that talks about python, but the blog posts might talk about django, so if someone searches for "django" I'd like the python blog to come up. Also, I want people to be able to search for specific posts. I also think (prove me wrong!) these need to be separate types of documents because they would have different fields, e.g. a post might have a date
field, while a blog would not have that field.
In any case: Ideally, I would like to be able to offer a search function against "blog" which would also search against the "post" text (as the relevant text might be in the post); additionally, I'd like to allow users to search all posts regardless of what blog they are associated with.
What are the best practices for setting this up? From what I can tell, Elasticsearch has removed the ability to have two types of documents on the same index, and parent-child relationships need to be on the same index. With this constraint, it seems like parent-child relationships would only be for relationships between documents of the same type, e.g. if you are indexing people and you can indicate who is a parent and child (literally).
The other option would be to create two indexes, one for blogs (which would include the posts' texts) and a second index which would include only the posts. But my instinct is that this would duplicate a tremendous amount of data, and also a lot more work to keep it updated and in sync with my main relational data store.
Upvotes: 0
Views: 238