Returning full results in MongoDB find()

Question

I've been working on a project to evaluate mongodb speed compared to another data store. To this end I'm trying to perform a full scan over a collection I've made. I found out about the profiler, so I have that enabled and set to log every query. I have a collection of a million objects, and i'm trying to time how long it takes to scan the collection. Unfortunately when I run

db.sampledata.find()

it returns immediately with a cursor to 1000 or so objects. So I wrote a python script to iterate through the cursor to handle all results. Here it is:

from pymongo import MongoClient

client = MongoClient()

db = client.argocompdb
data = db.sampledata

count = 0
my_info = data.find()

for row in my_info:
    count += 1

print count

This seems to be taking the requisite time. However, when I check the profiler, theres no overall amount for the full query time, it's just a whole whack of "getmore" ops that take 3-6 millis each. Is there any way to do what I'm trying to do using the profiler instead of timing it in python? I essentially just want to:

Be able to execute a query and have it return all results, instead of just the few in the cursor.
Get time for the "full query" in the profiler. The time it took to get all results.

Is what I want to do feasible?

I'm very new to MongoDB so I'm very sorry if this has been asked before but I couldn't find anything on it.

wberry · Accepted Answer

The profiler is measuring the correct thing. The Mongo driver is not returning all the records in the collection at once; it is first giving you a cursor, and then feeding the documents one by one as you iterate through the cursor. So the profiler is measuring exactly what is being done.

And I argue that this is a more correct metric than the one you are seeking, which I believe is the time that it takes to actually read all the documents into your client. You actually don't want the Mongo driver to read all the documents into memory before returning. No application would perform well if written that way, except for the smallest of collections. It's much faster for a client to read documents on demand, so that the smallest total memory footprint is necessary.

Also, what are you comparing this against? If you are comparing to a relational database, then it matters a great deal what your schema is in the relational DB, and what your collections and documents look like in Mongo. And of course, how each is indexed. Different choices can produce very different performance results, at no fault of the database engine.

The simplest, and therefore fastest, operations in Mongo will probably be lookups of tiny documents retrieved by their id which is always indexed: db.collection.find({id: ...}). But if you really want to measure a linear scan, then the smaller the documents are, the faster the scan will be. But really, this isn't very useful, as it basically only measures how quickly the server can read data from disk.

Returning full results in MongoDB find()

Answers (1)

Related Questions