jwillis0720
jwillis0720

Reputation: 4477

Checking position of an entry in an index MongoDB

I have a query using pymongo that is outputting some values based on the following:

cursor = db.collect.find({"index_field":{"$regex":'\s'}}
for document in cursor:
  print document["_id"]

Now this query has been running for a long time (over 500 million documents) as I expected. I was wondering though if there is a way to check where the query is in its execution by perhaps finding out where the last printed "_id" is in the indexed field. Like is the last printed _id halfway through the btree index? Is it near the end?

I want to know this just to see if I should cancel the query and reoptimize and/or let it finish, but I have no way of knowing where the _id exists in the query.

Also, if anyone has a way to optimize my whitespace query, that would be helpful to. Based on the doc, it seems if I would of used ignorecase it would of been faster, although it doesn't make sense for whitespace checking.

Thanks so much, J

Upvotes: 0

Views: 121

Answers (1)

Oran
Oran

Reputation: 877

Query optimization

Your query cannot be optimized, because it's an inefficient$regex search that's looking for the space \s in the the document. What you can do, is to search $regex for a prefix of \s, e.g.

db.collect.find({"index_field": {"$regex": '^\\s'}})

Check out the notes in the link

Indexing problem

$regex can only use an index efficiently when the regular expression has an anchor for the beginning (i.e. ^) of a string and is a case-sensitive match. Additionally, while /^a/, /^a.*/, and /^a.*$/ match equivalent strings, they have different performance characteristics. All of these expressions use an index if an appropriate index exists; however, /^a.*/, and /^a.*$/ are slower. /^a/ can stop scanning after matching the prefix.


DB op's info

Use db.currentOp() to get info on all of your running ops.

Upvotes: 1

Related Questions