Jose Ramon
Jose Ramon

Reputation: 5442

Check if an element exist in mongodB

I want to check in an if statement if an array exists in dB. So far, I am checking the above statement in the cursor, but I am guessing that it slows down the query speed. My code until now is:

EDIT: lines = [line.rstrip() for line in open(input_file)]

print len(lines)
row_no = len(lines)
col_no = len(lines)
matrix = sparse.lil_matrix((len(lines), len(lines)))

no_row  = 0
counter = 0
for item in lines:
    # find from database those items which their id exists in lines list and contain a follower_list 
    for cursor in collection.find({"_id.uid": int(item)}):
        if cursor['list_followers'] is None:
                continue
        else:               
            id = cursor['_id']['uid']
            counter+=1
            print counter
            print id
            name = cursor['screenname']
            # text.write('%s \n' %name)
            followers = cursor['list_followers']    
            print len(followers)
            for follower in followers:
                try:
                    if (follower in lines) and (len(followers)>0):
                        matrix[no_row, lines.index(follower)] = 1
                        print no_row, " ", lines.index(follower), " ", matrix[no_row, lines.index(follower)]
                except ValueError:
                    continue
            no_row+=1
            print no_row

scipy.io.mmwrite(output_file, matrix, field='integer')  

Finally I discovered that the delay was due to the creation of the sparse.lil_matrix

Upvotes: 0

Views: 3185

Answers (1)

Neil Lunn
Neil Lunn

Reputation: 151112

The nearest thing I can think of is implement a sparse index and query a little differently. I'll construct a sample to demonstrate:

{ "a" : 1 }
{ "a" : 1, "b" : [ ] }
{ "a" : 1 }
{ "a" : 1, "b" : [ ] }
{ "b" : [ 1, 2, 3 ] }

Essentially what you seem to be asking is to just get that last document as a match without scanning everything. This is where a different query and a sparse index helps. First the query:

db.collection.find({ "b.0": { "$exists": 1 } })

Only returns 1 item as that is the existing array with some content at it's first index position. Now the index:

db.collection.ensureIndex({ "b": 1 },{ "sparse": true })

But due to the query nature we have to .hint() this:

db.collection.find({ "b.0": { "$exists": 1 } }).hint({ "b": 1 }).explain()

That gets the 1 document and only considers the 3 documents that actually have an array.

Upvotes: 1

Related Questions