Laszlo the Wiz
Laszlo the Wiz

Reputation: 574

ArangoDB: Querying multiple fields at the same time for partial match

I have a database containing product information (SKU, model number, descriptions, etc) and I'd like to have a relatively quick search function where a user can just type in a few letters or a word from any of the the text fields and then get a list of products that contain that phrase in any of those fields.

The number of items in the database will probably not be more than 100,000.

What would be the easiest way to accomplish this, without creating complex queries?

Upvotes: 2

Views: 678

Answers (1)

Nate Gardner
Nate Gardner

Reputation: 1717

It sounds like you're looking for an autocomplete. There are numerous ways to do this.

Indexing

No matter the solution you choose, you'll want to put some indices on your data. I recommend adding a skiplist to everything you're going to be searching, and an additional fulltext index on any long-form text (such as product description). String comparison uses skiplists, while only a FULLTEXT search will leverage a fulltext index.

Querying

You have some choices here.

LIKE

https://docs.arangodb.com/3.11/aql/functions/string/#like

You could run your search something like:

for product in warehouse
    filter like(product.model, @searchTerm, true) or
           like(product.sku, @searchTerm, true)
    return product

Advantage: simple query syntax, multiple attributes in one search, supports substrings, can search the middle of a body of text.

Disadvantage: relatively slow.

Fulltext

This is a lot more complex for querying, but is very responsive, and is the approach my application uses for its autocomplete.

let sku = (for result in fulltext("warehouse", "sku", "prefix:@seacrhTerm")
           return {sku: result.sku, model: result.model, description: result.description}
let model = (for result in fulltext("warehouse", "model", "prefix:@searchTerm")
           return {sku: result.sku, model: result.model, description: result.description}
let description = (for result in fulltext("warehouse", "description", "prefix:@searchTerm")
           return {sku: result.sku, model: result.model, description: result.description}

let resultsMatch = union(sku,model,description)

return resultsMatch

Advantage: Very fast, extremely responsive, can handle very long bodies of text with ease, searches anywhere in a text body.

Disadvantage: Complex query structure as you need one variable for every attribute you're searching, a fulltext index created on each of those attributes you're searching, and a union at the end. You may need to do a union of the unioned results depending on how advanced your search needs to be. Doesn't support substring searching.

Raw string comparison

Simply create a query that filters for results to be greater than or equal to your search term, but less than your search term with the last letter incremented by 1. Example is in the link under the Foxx portion of my answer. This leverages skiplists.

Advantage: Very fast as long as the field is not tremendously long. Extremely easy to implement.

Disadvantage: Doesn't support substring searches. Only searches the first part of a string. I.e. you must know the beginning of the field you're searching.

This will work very well for quickly searching something like a model number where your users will probably know the beginning of it, but poorly for something like a description in which your users are probably searching for words somewhere in the middle of a body of text.

Foxx

Jan's little Cookbook example is a good place to start:

https://github.com/arangodb/docs/blob/2c842774b457114c571abdaa8391a038715d1458/3.3/cookbook/use-cases-populating-an-autocomplete-textbox.md

I would recommend abstracting whatever you do into a Foxx service. It is especially liberating if you need to dynamically build up AQL queries in database, in case you have a huge number of fields and collections to search and you need to generate a Fulltext search dynamically.

Bottom line

Experiment and see which of these works best for you. My best guess is that you will find the Fulltext solution the best if you need to search on product descriptions. If you expect your users to always search the first few letters of a field, just use the comparison with a skiplist as it is very very fast.

Upvotes: 3

Related Questions