Shiid Shah Sury
Shiid Shah Sury

Reputation: 333

MongoDB writing a query for search engine

I am trying to write a search script in MongoDB but can't figure out how to do it....The thing I wan't to do is as follows....

Lets I have a string array XD = {"the","new","world"}

Now i want to search string array XD in MongoDB document (using regex) and get the result document. For example..

{ _id: 1, _content: "there was a boy" }
{ _id: 2, _content: "there was a boy in a new world" }
{ _id: 3, _content: "a boy" }
{ _id: 4, _content: "there was a boy in world" }

now I want to get result in accordance to _content contains the string in string array XD

{ _id: 2, _content: "there was a boy in a new world", _times: 3 }
{ _id: 4, _content: "there was a boy in world", times: 2 }
{ _id: 1, _content: "there was a boy", times: 1 }

as first document (_id : 2 ) contains all three { "the" in there, "new" as new, "world" as world } so it got 3

second document (_id: 4) only two { "world" as world } so it got 2

Upvotes: 3

Views: 249

Answers (1)

dikesh
dikesh

Reputation: 3125

Here is what you can do.

Create a Regex to be matched with _content

XD = ["the","new","world"];
regex = new RegExp(XD.join("|"), "g");

Store a JS function on the server, which matches the _content with XD and returns the counts matched

db.system.js.save(
   {
     _id: "findMatchCount",
     value : function(str, regexStr) {
        XD = ["the","new","world"];
        var matches = str.match(regexStr);
        return (matches !== null) ? matches.length : 0;
     }
   }
)

Use the function with mapReduce

db.test.mapReduce(
    function(regex) {
       emit(this._id, findMatchCount(this._content, regex));
    },
    function(key,values) {
        return values;
    },
    { "out": { "inline": 0 } }
);

This will produce the output as below:

{
    "results" : [
        {
            "_id" : 1,
            "value" : 1
        },
        {
            "_id" : 2,
            "value" : 1
        },
        {
            "_id" : 3,
            "value" : 1
        },
        {
            "_id" : 4,
            "value" : 1
        }
    ],
    "timeMillis" : 1,
    "counts" : {
        "input" : 4,
        "emit" : 4,
        "reduce" : 0,
        "output" : 4
    },
    "ok" : 1
}

I am not sure how efficient this solution is but it works.

Hope this helps.

Upvotes: 1

Related Questions