Reputation: 3226
I am working on a function that helps me find similar documents, sorted by score, using the full-text search feature of MongoDB Atlas.
I set my collection index as "dynamic".
I am looking for similarities in text fields, such as "name" or "description", but I also want to look in another field, "thematic", that stores integer values (ids) of thematics.
Example:
Let say that I have a reference document as follows:
{
name: "test",
description: "It's a glorious day!",
thematic: [9, 3, 2, 33]
}
I want my search to match these int
in the thematic field and include their weight in the score calculation.
For instance, if I compare my reference document with :
{
name: "test2",
description: "It's a glorious night!",
thematic: [9, 3, 6, 22]
}
I want to increase the score since the thematic field shares the 9
and 3
values with the reference document.
Question:
What search operator should I use to achieve this? I can input array of strings as queries with a text
operator but I don't know how to proceed with integers.
Should I go for another approach? Like splitting the array to compare into several compound.should.term
queries?
Edit:
After a fair amount of search, I found this here and here:
Atlas Search cannot index numeric or date values if they are part of an array.
Before I consider to change the whole data structure of my objects, I wanted to make sure that there is no workaround.
For instance, could it be done with custom analyzers?
Upvotes: 4
Views: 1027
Reputation: 3226
I solved it by adding a trigger to my collection. Each time a document is inserted or updated, I update the thematic
and other similar fields counterparts, e.g. _thematic
, where I store the string value of the integers. I then use this _thematic
field for search.
Here is a sample code demonstrating it:
exports = function (changeEvent) {
const fullDocument = changeEvent.fullDocument;
const format = (itemSet) => {
let rst = [];
Object.keys(itemSet).forEach(item => rst.push(itemSet[item].toString()));
return rst;
};
let setter = {
_thematic: fullDocument.thematic ? format(fullDocument.thematic) : [],
};
const docId = changeEvent.documentKey._id;
const collection = context.services.get("my-cluster").db("dev").collection("projects");
const doc = collection.findOneAndUpdate({ _id: docId },
{ $set: setter });
return;
};
I'm pretty sure it can be done in a cleaner way, so if someone post it, I'll switch the selected answer to her/his.
Another way to solve this is to make a custom analyser with character mapping that will replace each digit with its string counterpart. I haven’t tried this one tho. See https://docs.atlas.mongodb.com/reference/atlas-search/analyzers/custom/#mapping
Alternatives welcome!
Upvotes: 0