Reputation: 4500
I have one collection with documents that have a "host" field, and I'm trying to match it to a document in a very large collection that has the same host. Both collections are on the order of a million documents. I'm still figuring out Mongo, but I believe I can do it crudely, iterating with Javascript. Is there a more efficient way?
Upvotes: 0
Views: 90
Reputation: 20703
In RDBMS, that would be a JOIN, something which doesn't exist on planet Mongo.
It really depends on your use case and your data model. The difference in data modeling between RDBMS and NoSQL databases is that you do data modelling for the former by the question "What answers can be provided by the data I have?" whereas data modeling for the latter is done by the question "Which questions do I have to be answered by the data?"
If you have a given host, the question is easy: "Which hosts in collection B match the given host I have?" Let's assume you have linked the documents via the _id
field. Then you would simply do
db.B.find({fieldToMatch:<givenHostsIdValue>})
e.g.
db.B.find({runningOnHost:e67848a7282919ac})
In case you have to correlate all hosts to the second table, you might (and most likely will) have to denormalize your data by embedding the host data into your other table. For example, when you try to keep track of all services that have to run on any given host, your modelling could look like this:
{
_id:e67848a7282919ac,
processes:['httpd', 'mongod', 'varnish'],
running:[’httpd’,’varnish’]
host: {
hostname: "web1.emea.mycompany.com",
ip:10.0.0.1,
datacenter: "EMEA"
}
}
This would describe ("document") a host in full, and you could do several interesting questions on this collection:
db.hosts.find({processes:'httpd','host.datacenter':'us-east'})
to find all designated web servers in the us-east datacenter or
db.hosts.find({'host.hostname':/emea.mycompany.com/},{host:1,processes:1,running:1})
to get the running processes and the ones which are supposed to run for all hosts of the domain emea.mycompany.com
. Using the aggregation framework, you could even do extremely complex queries on that collection.
Please have a deep look into the docs about a Data Modeling and the aggregation framework. Combined, they should make it possible to answer the questions you have on your data. ;)
Upvotes: 1