littlely
littlely

Reputation: 1428

pymongo.errors.DocumentTooLarge: 'aggregate' command document too large

I aggregate data using pymongo, the data is about 10 millions, I use

df_list = mycol.aggregate([{'$match': {'tbl_id': {'$in': doc_list}}}, {'$project': {'_id': 0}}], allowDiskUse=True) # doc_list may be very large

to aggregate the data, but it still went wrong,pymongo.errors.DocumentTooLarge: 'aggregate' command document too large, the allowDiskUse=True cannot work for me. How to deal with it?

Upvotes: 4

Views: 2794

Answers (1)

Kailegh
Kailegh

Reputation: 197

Mongo documents cannot be larger than 16MBs. That means that your query object:

df_list = mycol.aggregate([{'$match': {'tbl_id': {'$in': doc_list}}}, {'$project': {'_id': 0}}], allowDiskUse=True) # doc_list may be very large

cannot be larger than 16MBs, you probably have a huge doc_list which exceeds the maximun allowed size. allowDiskUse option does nothing because the problem is prior to that.

You can split your query in multiples queries by batching your doc_list, is slower but it does the trick.

Upvotes: 1

Related Questions