AnchovyLegend
AnchovyLegend

Reputation: 12538

Arango wildcard query

I am working on building a simple arango query where if the user enters: "foo bar" (starting to type Foo Barber), the query returns results. The issue I am running in to is going from a normal single space separated string (i.e. imagine LET str = "foo barber" at the top), to having multiple wildcard queries like shown below.

Also, open to other queries that would work for this, i.e. LIKE, PHRASE or similar.

The goal is when we have a single string like 'foo bar', search results are returned for Foo Barber and similar.

    FOR doc IN movies SEARCH PHRASE(doc.name,
[
   {WILDCARD: ["%foo%"]},
   {WILDCARD: ["%bar%"]}
], "text_en") RETURN doc

Upvotes: 0

Views: 528

Answers (1)

CodeManX
CodeManX

Reputation: 11915

If you want to find Black Knight but not Knight Black if the search phrase is black kni, then you should probably avoid tokenizing Analyzers such as text_en.

Instead, create a norm Analyzer that removes diacritics and allows for case-insensitive searching. In arangosh:

var analyzers = require("@arangodb/analyzers");
analyzers.save("norm_en", "norm", {"locale": "en_US.utf-8", "accent": false, "case": "lower"}, []);

Add the Analyzer in the View definition for the desired field (should be title and not name, shouldn't it?). You should then be able to run queries like:

  • FOR doc IN movies SEARCH ANALYZER(STARTS_WITH(doc.title, TOKENS("Black Kni", "norm_en")[0]), "norm_en") RETURN doc
  • FOR doc IN movies SEARCH ANALYZER(LIKE(doc.title, TOKENS("Black Kni%", "norm_en")[0]), "norm_en") RETURN doc
  • FOR doc IN movies SEARCH ANALYZER(LIKE(doc.title, CONCAT(TOKENS(SUBSTITUTE("Black Kni", ["%", "_"], ["\\%", "\\_"]), "norm_en")[0], "%")), "norm_en") RETURN doc

The search phrase Black Kni is normalized to black kni and then used for a prefix search, either using STARTS_WITH() or LIKE() with a trailing wildcard %. The third example escapes user-entered wildcard characters.

Upvotes: 0

Related Questions