How to handle document size exceeds 16MB error while inserting a document into the collection on MongoDB

Question

Can anyone please suggest how to handle the document size exceeds 16MB error while inserting the document into the collection on MongoDB. I got some solutions like GridFS. By using GridsFS can handle this problem but I need a solution without using GridFS. Is there any way to make the document smaller or split into subdocuments. If yes how can we achieve?

from pymongo import MongoClient

conn = MongoClient("mongodb://sample_mongo:27017")
db_conn = conn["test"]
db_collection = db_conn["sample"]

# the size of record is 23MB

record = { \
    "name": "drugs",
    "collection_id": 23,
    "timestamp": 1515065002,
    "tokens": [], # contains list of strings
    "tokens_missing": [], # contains list of strings
    "token_mapping": {} # Dictionary contains transformed tokens
 }

db_collection.insert(record, check_keys=False)

I got the error DocumentTooLarge: BSON document too large. In MongoDB, the maximum BSON document size is 16 megabytes.

  File "/usr/local/lib/python2.7/dist-packages/pymongo-3.5.1-py2.7-linux-x86_64.egg/pymongo/collection.py", line 2501, in insert
check_keys, manipulate, write_concern)
  File "/usr/local/lib/python2.7/dist-packages/pymongo-3.5.1-py2.7-linux-x86_64.egg/pymongo/collection.py", line 575, in _insert
check_keys, manipulate, write_concern, op_id, bypass_doc_val)
  File "/usr/local/lib/python2.7/dist-packages/pymongo-3.5.1-py2.7-linux-x86_64.egg/pymongo/collection.py", line 556, in _insert_one
check_keys=check_keys)
  File "/usr/local/lib/python2.7/dist-packages/pymongo-3.5.1-py2.7-linux-x86_64.egg/pymongo/pool.py", line 482, in command
self._raise_connection_failure(error)
  File "/usr/local/lib/python2.7/dist-packages/pymongo-3.5.1-py2.7-linux-x86_64.egg/pymongo/pool.py", line 610, in _raise_connection_failure
raise error
  DocumentTooLarge: BSON document too large (22451007 bytes) - the connected server supports BSON document sizes up to 16793598 bytes.

Clement Amarnath · Accepted Answer

The maximum BSON document size is 16 megabytes. To store documents larger than the maximum size, MongoDB provides the GridFS API

GridFS is a specification for storing and retrieving files that exceed the BSON-document size limit of 16 MB. GridFS stores the big sized document by dividing it into parts or chunks. Each chunk is stored in a seperate document. Default size of a GridFS chunk is 255 KB. GridFS uses two collections to store files. One collection stores the file chunks, and the other stores file metadata.

How to handle document size exceeds 16MB error while inserting a document into the collection on MongoDB

Answers (2)

Related Questions