Semant1ka
Semant1ka

Reputation: 687

How to efficiently do complex queries on MongoDB unindexed fields?

I am building filtering functionality for web application which should look like JIRA of TFS filtering query. So user should be able to filter on fields contents and use logical operators in filter query.

The data lives in MongoDB and the main challenge is that the fields on which we will filter should support not only strict equality but also full text search are difficult to index because they can vary for each user.

In a nutshell, there is a nested object, which has three other nested object that can have different amount of fields depending of user, field names are also set by user, so we don't know them.

For example document structure in collection can be:

{
_id: ObjectId()
storage: {
obj_1:{}
obj_2:{}
         }
},
{
_id: ObjectId()
storage: {
obj_1:{
field_1 : val,
field_2 : val
}
obj_2:{}
         }
}

I imagine queries will be something like:

find({$and:[{"storage.obj_1.field_1":{$regex: "va"}},{"storage.obj_1.var_2":"val"}]})

Unfortunately, I am not a database expert so the solutions that I see now are:

1) Use Elasticsearch as a search engine. But the question is: how do I set Elastic index if I don't know my documents structure?

2) Use sparse index in Mongo. But I will need to use regex for matching, is that solution better than Elastic?

So the question is: what is the best way to do filtering in such a DB structure?

p.s. I have put this question in SO and not Software Engineering because SO has more active members, pls keep your downvotes for later

Upvotes: 2

Views: 443

Answers (1)

xeraa
xeraa

Reputation: 10859

  1. Elasticsearch and MongoDB (much like a relational database) behave differently for indexing: In MongoDB you need to explicitly index every field (for a non $text index). In Elasticsearch every field is automatically indexed. Don't go too crazy on the number of fields in Elasticsearch, since there is a little overhead for each field (in terms of disk space though that has improved in version 6).
  2. As soon as you are having more than a test data set, regular expressions are often slow, since they can only use indices in specific cases and you need to define those indices explicitly. Maybe the $text index and search operator are more what you are looking for. That one can index every field in a collection as well. If you need more features and a system that is fully built for search, then Elasticsearch will be the better option though.

Upvotes: 1

Related Questions