mongodb best array match

Question

Suppose I have the following structure

[{    
    name: 'John',    
    tags: ['unix','databases']    
},    
{    
    name: 'Jane',    
    tags: ['excel', 'power-point','word', 'outlook']   
},   
{  
    name: 'Smith',  
    tags: ['databases', 'linux', 'android']  
}]

and I want to search for people with ['databases','servers','c++']

I want a query which will give me that the two best matches are Smith and John with one match each.

This feels similar to having two term vectors and finding cosine product http://en.wikipedia.org/wiki/Vector_space_model

P.S.
I realize I can probably do a $in and then calculate number of similar terms in my program(written in Java), but is there a way to get the answer from mongo itself

Devesh · Accepted Answer

I think why do not you use the map-reduce. Create the Inverted index of your tags in the new collection and store the ID against the Tags . then you can count the matches of all the tags and then display sorted by maximum matches . Check one of blog here : http://ngsiolei.blogspot.com/2010/11/basic-inverted-index-in-mongodb.html . Even though it looks like more text search but you can also use the same. Even in future this approach will give you flexibility to add some weight-age to your tags to get better result . The collection will look like this. So when I search for C++ for Database both , i will get James in both and I will sum it 2 and J will get 1 , So James is matching more. In this way we will create one document for each tag and it will be the Id , so your search will be faster. If you want some easy way use the Aggregation Framework (http://docs.mongodb.org/manual/applications/aggregation/) and use the unwind on the tags column

         C++ : ["James" , "J" ] , Database : ["James"]

mongodb best array match

Answers (1)

Related Questions