Thrisundar Reddy J
Thrisundar Reddy J

Reputation: 73

How to handle document size exceeds 16MB error while inserting a document into the collection on MongoDB

Can anyone please suggest how to handle the document size exceeds 16MB error while inserting the document into the collection on MongoDB. I got some solutions like GridFS. By using GridsFS can handle this problem but I need a solution without using GridFS. Is there any way to make the document smaller or split into subdocuments. If yes how can we achieve?

from pymongo import MongoClient

conn = MongoClient("mongodb://sample_mongo:27017")
db_conn = conn["test"]
db_collection = db_conn["sample"]

# the size of record is 23MB

record = { \
    "name": "drugs",
    "collection_id": 23,
    "timestamp": 1515065002,
    "tokens": [], # contains list of strings
    "tokens_missing": [], # contains list of strings
    "token_mapping": {} # Dictionary contains transformed tokens
 }

db_collection.insert(record, check_keys=False)

I got the error DocumentTooLarge: BSON document too large. In MongoDB, the maximum BSON document size is 16 megabytes.

  File "/usr/local/lib/python2.7/dist-packages/pymongo-3.5.1-py2.7-linux-x86_64.egg/pymongo/collection.py", line 2501, in insert
check_keys, manipulate, write_concern)
  File "/usr/local/lib/python2.7/dist-packages/pymongo-3.5.1-py2.7-linux-x86_64.egg/pymongo/collection.py", line 575, in _insert
check_keys, manipulate, write_concern, op_id, bypass_doc_val)
  File "/usr/local/lib/python2.7/dist-packages/pymongo-3.5.1-py2.7-linux-x86_64.egg/pymongo/collection.py", line 556, in _insert_one
check_keys=check_keys)
  File "/usr/local/lib/python2.7/dist-packages/pymongo-3.5.1-py2.7-linux-x86_64.egg/pymongo/pool.py", line 482, in command
self._raise_connection_failure(error)
  File "/usr/local/lib/python2.7/dist-packages/pymongo-3.5.1-py2.7-linux-x86_64.egg/pymongo/pool.py", line 610, in _raise_connection_failure
raise error
  DocumentTooLarge: BSON document too large (22451007 bytes) - the connected server supports BSON document sizes up to 16793598 bytes.

Upvotes: 7

Views: 13244

Answers (2)

kevinadi
kevinadi

Reputation: 13785

The quick answer is no, you cannot go around the 16 MB BSON size limitation. If you hit this limit, you will need to explore alternatives such as GridFS or different schema design for your documents.

I would start by asking a series of questions to determine the focus of your design, such as:

  1. You have fields called tokens, tokens_missing, and token_mapping. I imagine these fields are very large individually, and putting all three into one document pushes it to >16 MB. Is it possible to split this document into three collections instead?

  2. What is your application's access pattern? What field do you need to access all the time? What field you don't access that often? You can split up the document into different collections based on those patterns.

  3. Bear in mind the need to index the documents, since MongoDB's performance is highly tied to good indexes that supports your query. You cannot index two arrays in a single index. There are more information in Multikey Indexes.

  4. If you need to combine all the related data in a query, MongoDB 3.2 and newer provides you with the $lookup operator, which is similar to SQL's left outer join.

Unlike SQL's normal form schema design, MongoDB's schema design is based on your application's access pattern. The 16 MB limit is there to let you know that the design is probably not optimal, since such large documents will be detrimental to performance, difficult to update, etc. Typically, it's better to have a lot of small documents as opposed to a few gigantic documents.

More examples can be found in Data Model Design and Data Model Examples and Patterns.

Upvotes: 1

Clement Amarnath
Clement Amarnath

Reputation: 5466

The maximum BSON document size is 16 megabytes. To store documents larger than the maximum size, MongoDB provides the GridFS API

GridFS is a specification for storing and retrieving files that exceed the BSON-document size limit of 16 MB. GridFS stores the big sized document by dividing it into parts or chunks. Each chunk is stored in a seperate document. Default size of a GridFS chunk is 255 KB. GridFS uses two collections to store files. One collection stores the file chunks, and the other stores file metadata.

Upvotes: 2

Related Questions