Zeal
Zeal

Reputation: 41

Mongodb query a collection based on if the field contains an array value as substring

Thanks in advance for any help guys! So, I have two collections: A and B.

A is a collection of personal information:

{
  "_id": "3453hkj54h5k34j5hkjh"  
  "location": "New York, U.S.",  
  "first-name": "Archer",  
  "last-name": "Vice",  
  "industry": "intelligence"
},
{
  "_id": "3453hkj5sdfdddjh",    
  "location": "London, UK",    
  "first-name": "Harry",    
  "last-name": "Potter",    
  "industry": "security"
},
{
  "_id": "345dfdf5sdfdddjh",
  "location": "D.C., US",
  "first-name": "Obama",  
  "last-name": "Barack",  
  "industry": "president"
}   

B is a collection of location information in united state:

{
  "_id": "998sdfdsfhejf",  
  "city": "New York",    
  "zip": "10122",  
  "state": "NY",  
  "lat": 40.749,  
  "longt": -73.9885
},  
{
  "_id": "998sdfsdfdsfhejf",  
  "city": "D.C."  
  "zip": "20500",  
  "state": "DC",  
  "lat": 38.8951,  
  "longt": -77.0369
}  

I what to find out who lives in US by comparing the location field in A against city field in B. B should be a sub string of A, as A often carries state, or country information.

I already converted B to an array by:

var f = db.collection.find(), n = [];
for (var i = 0; i < f.length(); i++) n.push(f[i]['field']);

now B is var n=["D.C.", "New York"]

I know how to check if something is in the array. you do:

db.database.find({
   field: 
      { 
         $in: array 
      } 
   }); 

To check substring you do this:

db.database.find({A: /substring/ });

or

db.database.find({A: {$regex: 'substring'}});

expected results are

{
  "_id": "3453hkj54h5k34j5hkjh",    
  "location": "New York, U.S.",   
  "first-name": "Archer",  
  "last-name": "Vice",  
  "industry": "intelligence"
},
{
  "_id": "345dfdf5sdfdddjh",
  "location": "D.C., US",  
  "first-name": "Obama",  
  "last-name": "Barack",  
  "industry": "President"
}   

"D.C., US" contains substring "D.C." which is a value in the array n=["D.C.", "New York"].

I know I can do it through mapreduce, but it really just seems to be a one liner. I'm also learning how to join these two collections.

Upvotes: 1

Views: 614

Answers (1)

while
while

Reputation: 3772

This is not super simple to do in a statement, but it is possible. If your list of search terms is as short as you have stated in the question you can do it in one line combining it into a regular expression like this:

 db.test.find({location: {$regex: new RegExp(n.join('|'))}})

If the list is not too long that is. It will be quite slow if the regexp gets too complex. If its very short you could ff course then you could also write out the RegExp literally.

n is defined in the shell as you have in the question. Here i used:

var n = ["D.C.", "New York"];

This will give the following result:

{ "_id" : "3453hkj54h5k34j5hkjh", "location" : "New York, U.S.", "first-name" : "Archer", "last-name" : "Vice", "industry" : "intelligence" }
{ "_id" : "345dfdf5sdfdddjh", "location" : "D.C., US", "first-name" : "Obama", "last-name" : "Barack", "industry" : "president" }

EDIT

Here is an alternative for your join if your list is too long:

n.reduce(function (lst, d) {
    var res = db.test.find({location: {$regex: d}}).toArray();
    Array.prototype.push.apply(lst, res); 
    return lst;
}, []);

It loops over all entries in your list and finds the matching entries and adds up all the results into a new list.

If you want you could insert them into a new collection instead to avoid keeping it all in memory. You could also use a search directly and not extract the result from collection B into a list. This should also be better in terms of memory.

This will save the result to a collection names test_result (using collections A and B in the searches):

db.B.find().forEach(function (d) { 
    db.test_result.insert(db.A.find({location: {$regex: d.city}}).toArray())
});

Upvotes: 4

Related Questions