IamIC
IamIC

Reputation: 18259

Heavy math queries and NoSQL databases

I have a very specific data format and querying need, and I need to know the suitability of NoSQL DBs to this need. I am not asking "which DB is best". I am interested in capabilities.

I need to store data in EAV style. Document stores with sparse indexes are perfect for this. This way I can create an index against each parameter on its values. When querying, only the needed indexes will be touched. MongoDB, for example, is perfect for this. This is need #1.

The query is in two stages. The first is a simple equivalent of "WHERE" and involves a series of <=> operations against real numbers. The results could be in the tens of thousands of records, but typically would be in the thousands. This is need #2.

The second stage involves heavy mathematics that I have to perform on the stage 1 results in order to rank them. This math involves heavy use of powers and simpler operations. The results are then sorted by rank, and the "top 100" returned to the client. This is need #3.

MongoDB is the only NoSQL DB I'm relatively familiar with, so I'll use it as a reference. I don't believe it can perform math in queries, and even if it could, it will likely be slow. I believe the math needs to be performed on the client (in C or CUDA). This means that the data needs to be transfered very rapidly from DB to client. I know MongoDB has a native binary connection, but, for e.g., Couchbase uses REST, which I believe will make it slower at data transfer of large datasets.

The reason I haven't settled on MongoDB is that I need distributed servers, which, for e.g., Couchbase seems better suited for.

So I need a solution that can either perform fast math internally, thus limiting the number of records to be transfered, or that can transfer records very rapidly so that they may be processed on the client. I do understand the only way to know is to test, but what I don't know, hence this question, is which NoSQL DBs have the mentioned capabilities.

Upvotes: 3

Views: 1246

Answers (1)

hymloth
hymloth

Reputation: 7045

MongoDB provides server side javascript execution, which may solve some of your problems, but I am afraid I can't tell how efficiently. However, I suspect that your workflow is I/O bound (you mentioned thousands of records), so it will probably be better not to do client processing. Of course a benchmark will tell the truth, but I propose another solution.

Have you tried Redis? It has powerful sorted sets that fit perfectly for your range and rank queries. Additionally, the next version will introduce LUA scripting, which remedies the I/O nature of your workflow. Keep in mind that Redis is really super fast.

Upvotes: 1

Related Questions