Reputation: 265
Currently I have setup MongoDB on an EC2 with Amazon Linux. It has around 1M documents.
On the same EC2, I used pymongo db.collection.find({}, {'attribute_1':1}) to query the all the attribute_1 in all documents.
The problem is, after iterating and retrieving around 200,000 documents, my python code just stop working.
It does not show any error (I did try catch). In mongodb log also doesn't show any specific error.
I highly suspect it because of the EC2 network bandwidth, however, I tried to split the documents in batches, with 100,000 documents per batch. And it still not works. It just automatically break at around 200,000 documents. The code is as below:
count = db.collection.count()
page = int(ceil(count/100000.0))
result = []
i = 0
for p in range(0, page):
temp = db.collection.find({}, {'attribute_1':1})[p*100000:p*100000+100000]
for t in temp:
result.append(t['attribute_1'])
i = i+1
print i
I tried EC2 log also and found nothing weird. The EC2 continued to work normally after the break (I still could access the command line, cd, ls etc.) My EC2 instance is c3.2xlarge. I currently stuck with this for few days, any help is appreciated. Thanks in advance.
Update: After searching for system log, I found these:
Apr 22 10:12:53 ip-xxx kernel: [ 8774.975653] Out of memory: Kill process 3709 (python) score 509 or sacrifice child
Apr 22 10:12:53 ip-xxx kernel: [ 8774.978941] Killed process 3709 (python) total-vm:8697496kB, anon-rss:8078912kB, file-rss:48kB
My EC2 instance already has 15 GB RAMs. The Attribute_1 is a python list of words. Each Attribute_1 consists quite a lot amount of elements (words). Is there anyway for me to fix this problem?
Upvotes: 0
Views: 803
Reputation: 2925
You appear to be creating a very large list result
and that has exceeded the available memory in the instance. Generally this will indicate that you need to re-design some part of your system so that only the data you really need is required to be processed by python. A few options:
find
returns a cursor - maybe you don't actually need the list at allThere are other approaches but an error like this should lead you to ask yourself "Do I need all of this data in a python list?"
Upvotes: 2