AnhNg
AnhNg

Reputation: 199

How to dump a collection to json file using pymongo

I am trying to dump a collection to .json file but after looking in pymongo tutorial I can not find any thing that relates to it.

Tutorial link: https://api.mongodb.com/python/current/tutorial.html

Upvotes: 18

Views: 28124

Answers (7)

Anonyo Noor
Anonyo Noor

Reputation: 533

I liked @robscott's answer as it seemed the most intuitive while also not creating an invalid JSON. My use case also specifically needed to iterate over each document.

Here is a simplified version of that, as it requires no document count. Instead of adding the comma after the dump, it just adds it after.

The idea is the same though, as it adds every comma but the first.

filter = {"type": "something"}
type_documents = db['cluster'].find(filter)

with open("type_documents.json", "w") as file:
    file.write('[')
    for i, document in enumerate(type_documents, 1):
        if i != 1:
            file.write(',')
        file.write(json.dumps(document, default=str))
    file.write(']')

Upvotes: 1

Livne Rosenblum
Livne Rosenblum

Reputation: 306

Using pymongo's json_util:

from bson.json_util import dumps
from pymongo import MongoClient
import json

db_client = MongoClient(mongo_connection_string)
collections = db.collection_name
for collectio in collections.find():
    with open("collection.json", 'w') as file:
        op_json = dumps(operation)
        json.dump(op_json, file)

Upvotes: 0

garyj
garyj

Reputation: 1432

The accepted solution produces an invalid JSON. It results in trailing comma , before the close square bracket ]. The JSON spec does not allow trailing commas. See this answer and this reference.

To build on the accepted solution I used the following:

from bson.json_util import dumps
from pymongo import MongoClient
import json

if __name__ == '__main__':
    client = MongoClient()
    db = client.db_name
    collection = db.collection_name
    cursor = collection.find({})
    with open('collection.json', 'w') as file:
        json.dump(json.loads(dumps(cursor)), file)

Upvotes: 20

Aseem Jain
Aseem Jain

Reputation: 351

"""
@Author: Aseem Jain
@profile: https://www.linkedin.com/in/premaseem/

"""
import os
import pymongo

# configure credentials / db name
db_user = os.environ["MONGO_ATLAS_USER"]
db_pass = os.environ["MONGO_ATLAS_PASSWORD"]
db_name = "sample_mflix"

connection_string = f"mongodb+srv://{db_user}:{db_pass}@sharedcluster.lv3wx.mongodb.net/{db_name}?retryWrites=true&w=majority"

client = pymongo.MongoClient(connection_string)
db = client[db_name]

# create database back directory with db_name
os.makedirs(db_name, exist_ok=True)

# list all tables in database
tables = db.list_collection_names()

# dump all tables in db
for table in tables:
    print("exporting data for table", table )
    data = list(db[table].find())
    # write data in json file
    with open(f"{db.name}/{table}.json","w") as writer:
        writer.write(str(data))

exit(0)

Upvotes: 1

kamillitw
kamillitw

Reputation: 472

Just get all documents and save them to file e.g.:

from bson.json_util import dumps
from pymongo import MongoClient

if __name__ == '__main__':
    client = MongoClient()
    db = client.db_name
    collection = db.collection_name
    cursor = collection.find({})
    with open('collection.json', 'w') as file:
        file.write('[')
        for document in cursor:
            file.write(dumps(document))
            file.write(',')
        file.write(']')

Upvotes: 15

Naiara Andrade
Naiara Andrade

Reputation: 155

Complementing @kamilitw I use length of cursor to make a JSON file correctly. I use count() and if-else:

def writeToJSONFile(collection):
    cursor = collection.find({})
    file = open("collection.json", "w")
    file.write('[')
    qnt_cursor = 0
    for document in cursor:
        qnt_cursor += 1
        num_max = cursor.count()
        if (num_max == 1):
            file.write(json.dumps(document, indent=4, default=json_util.default))
        elif (num_max >= 1 and qnt_cursor <= num_max-1):
            file.write(json.dumps(document, indent=4, default=json_util.default))
            file.write(',')
        elif (qnt_cursor == num_max):
            file.write(json.dumps(document, indent=4, default=json_util.default))
    file.write(']')
    return file

So the JSON file will be correct in the and, because before as writing like this: [{"test": "test"},], now it's writing: [{"test":"test1"},{"test":"test2"}]

Upvotes: 1

robscott
robscott

Reputation: 91

Here's another way of not saving a , before the closing square brackets. Also using with open to save some space.

filter = {"type": "something"}
type_documents = db['cluster'].find(filter)
type_documents_count = db['cluster'].count_documents(filter)

with open("type_documents.json", "w") as file:
    file.write('[')
    # Start from one as type_documents_count also starts from 1.
    for i, document in enumerate(type_documents, 1):
        file.write(json.dumps(document, default=str))
        if i != type_documents_count:
            file.write(',')
    file.write(']')

It basically doesn't write the comma if number of iterations is equal to the number of documents (which is the last document it saves).

Upvotes: 2

Related Questions