Reputation: 1428
I aggregate data using pymongo, the data is about 10 millions, I use
df_list = mycol.aggregate([{'$match': {'tbl_id': {'$in': doc_list}}}, {'$project': {'_id': 0}}], allowDiskUse=True) # doc_list may be very large
to aggregate the data, but it still went wrong,pymongo.errors.DocumentTooLarge: 'aggregate' command document too large
, the allowDiskUse=True
cannot work for me. How to deal with it?
Upvotes: 4
Views: 2794
Reputation: 197
Mongo documents cannot be larger than 16MBs. That means that your query object:
df_list = mycol.aggregate([{'$match': {'tbl_id': {'$in': doc_list}}}, {'$project': {'_id': 0}}], allowDiskUse=True) # doc_list may be very large
cannot be larger than 16MBs, you probably have a huge doc_list which exceeds the maximun allowed size. allowDiskUse option does nothing because the problem is prior to that.
You can split your query in multiples queries by batching your doc_list, is slower but it does the trick.
Upvotes: 1