Reputation: 6980
I need to rank records in a dataset by the relevance to a certain query but not filter out data that is irrelevant. I would like to use Algolia for this if possible.
Imagine I have a dataset of crates of fruits and their geolocations.
[
{
"fruits": ["apple", "orange"],
"_geoloc": {"lat": 1, "lng": 2}
},
{
"fruits": ["banana", "apple"],
"_geoloc": {"lat": 8, "lng": 2}
},
{
"fruits": ["banana"],
"_geoloc": {"lat": 5, "lng": 2}
},
{
"fruits": ["apple", "banana"],
"_geoloc": {"lat": 8, "lng": 2}
},
{
"fruits": ["orange"],
"_geoloc": {"lat": 1, "lng": 2}
}
]
I need to query the data so that I return all of the data but ranked by the match to the input query and the proximity to the specified geolocation.
So, if the geolocation is {"lat": 1, "lng": 2}
and the query is apple, banana
the resulting ranked data would be something like this:
[
{
"fruits": ["apple", "banana"],
"_geoloc": {"lat": 8, "lng": 2}
},
{
"fruits": ["banana", "apple"],
"_geoloc": {"lat": 8, "lng": 2}
},
{
"fruits": ["apple", "orange"],
"_geoloc": {"lat": 1, "lng": 2}
},
{
"fruits": ["banana"],
"_geoloc": {"lat": 5, "lng": 2}
},
{
"fruits": ["orange"],
"_geoloc": {"lat": 1, "lng": 2}
}
]
The record that matches the query exactly comes first, then come records with a different word ordering, then records with some of the words (but closer proximity), and finally the record(s) with no matching words.
So far I have used the Dashboard in Algolia to play around with this. However, irrelevant records are always filtered out, when the desire here is to always show all data (just sorted).
With the querying strategy described above it would return something like:
[
{
"fruits": ["apple", "banana"],
"_geoloc": {"lat": 8, "lng": 2}
},
{
"fruits": ["banana", "apple"],
"_geoloc": {"lat": 8, "lng": 2}
}
]
The data matching the query is returned but not the rest. Even the data missing a keyword is removed.
I have considered using disjunctive faceting to achieve this but this has two problems:
I need full-text search with typo tolerance within the word query. For example, the user could add a facet of "apple" or "cooked apples" and the records containing "apple" would still be highly ranked. Conversely, there is no limitation to what may be in the "fruits" array. That array may also contain typos or related but not exact matches.
Records not matching the query would still not be returned. With faceting, the records of "orange" and "banana", only, in the fruits array would still not be returned.
Upvotes: 1
Views: 383
Reputation: 3177
There are two ways you can use Algolia for doing this: as a search engine or as a primary data source. As the first option is what Algolia recommends, I'll start with this one.
Algolia being a search-engine, it is designed to process search queries and return the subset of all your records that is relevant to the query.
This means Algolia is not meant to be used as a primary data-source: most of the time the engine won't return all your objects but rather the most relevant ones for the current query. This difference between a search engine and a regular database allows all the optimisations that make Algolia so fast.
For your use-case of sorting all your crates according to content and position, you could use Algolia for knowing which ones are relevant to a query, then sort your whole dataset using that information.
For example you can both get the list of crates from your primary database and perform a query in Algolia to check which ones are the most relevant.
Then you would display the Algolia results first, followed by the remaining crates from your list (maybe indicating These crates don't contain the fruits you requested (apple banana)
).
You would set your index settings as such:
geo
criteria is used to rank the crates whose fruits
match the query)Algolia would then return all crates that contain every fruit in the query, sorted by geographic proximity.
You could also use removeWordsIfNoResults=allOptional
so if a user types orange kiwi
and no crate contains both you would get the crates that contain only orange
or only kiwi
. Likewise, if a user typed kiwi
and no crates contains it, the engine would return all crates simply sorted by geolocation.
If your data is only stored in Algolia, you can do two queries: a first one to get all your records and a second one to get the relevant results. You can then merge them by putting the relevant ones first, and display the resulting list.
You would use search
for getting the relevant results, and browse
for getting all your records by batches of 1000. Once you have both lists, you'll just have to display the relevant crates, remove the duplicates from the second list then display the remaining crates.
The index settings would be the same as previously, to filter first by content and then order by geolocation.
As in the first approach, you can also use removeWordsIfNoResults
to remove words from the query until the engine finds relevant results.
Upvotes: 2