berimbolo
berimbolo

Reputation: 3829

mongo db - efficient way to search strings

I am looking to search strings that are broken up into a prefix, numbers and letters, I currently need to search 55 million rows and a row looks like this (I also save a concatanated version of the entire string):

{ "prefix" : "AA", "numbers" : "02", "letters" : "AFO", "code" : "AA02AFO" "price" : "699" }

What is the most efficient way of searching if I want a combined search which will match the above exactly but also return matches like:

AA*AFO
A*AFO
*A02AFO
**02AFO

And so on, I want to return an exact match if this exists and I also want to return any records which are similar to the search record as well.

I currently have an index on all the fields except price and an index on the concatanated code. Is there also a way to ensure the query returns distinct results if a string matches multiple regexs?

Upvotes: 1

Views: 1836

Answers (3)

Hashcut
Hashcut

Reputation: 843

I would second Willis's answer and suggest that you do exact matches on each of the components of the string. However, if that's not flexible enough (i.e. you want to be able to do partial matches within each component as well), I'd recommend setting up a mirrored ElasticSearch engine, especially given how many records you have. MongoDB isn't really designed to do efficient freeform text searching.

My own personal experience was that with about 10 million records, even with indexing the concatenated string field, doing freeform searching was taking >30sec per query, even with enough RAM to hold the (pretty big) index. As I was trying to do realtime auto-complete on a text field, that wouldn't work :-)

ElasticSearch has several solutions for replicating data from mongodb into its own database automatically, so I found once I set everything up, it went pretty smoothly. I use monstache (https://github.com/rwynn/monstache) but there are others as well. I realize this is not exactly what you asked for, but I went through a painful process of trying to optimize mongodb for freeform text searching that didn't take forever and ultimately gave up. If this is going to be a common query, and you need decent performance, I highly suggest looking into something like ElasticSearch that's optimized for it.

Upvotes: 1

klhr
klhr

Reputation: 3390

First off, constructing this query as a regular expression against a string is going to be one of the least efficient ways of querying. Instead, you should query by the actual fields:

{
$or: [
  {prefix: "AA", numbers: "02", "letters": "AFO"}, -- full match
  {prefix: {$exists: true}, numbers: "02", "letters: "AFO" } -- varying prefix,
   ...etc
]}

Querying by the actual fields will allow Mongo to efficiently use indexes. A regular expression like /^..02AFO/ will require checking every single record in the index for matches.

I currently have an index on all the fields except price and an index on the concatanated code. Is there also a way to ensure the query returns distinct results if a string matches multiple regexs?

If you did want to match against regular expressions, you could use $in to combine multiple regular expressions & return only unique matching documents: {code: {$in: [/foo/, /bar/] }

You could also use $or: {$or: [{code: /foo/}, {code: /bar/}], otherCriterion: "yay"}.

Upvotes: 2

jakemingolla
jakemingolla

Reputation: 1649

You can use the $regex query operator to test each individual regex. You can cross-reference the results via the record's code to see if it is in multiple results:

const regexes = [
  /AA*AFO/,
   ...
];

const results = {};

regexes.forEach(async (regex) => {
  results[regex] = await db.collection.find(
    {
      code: {
        $regex: regex
       }
    });
});

Upvotes: 1

Related Questions