Reputation: 199
I am trying to dump a collection to .json file but after looking in pymongo tutorial I can not find any thing that relates to it.
Tutorial link: https://api.mongodb.com/python/current/tutorial.html
Upvotes: 18
Views: 28124
Reputation: 533
I liked @robscott's answer as it seemed the most intuitive while also not creating an invalid JSON. My use case also specifically needed to iterate over each document.
Here is a simplified version of that, as it requires no document count. Instead of adding the comma after the dump, it just adds it after.
The idea is the same though, as it adds every comma but the first.
filter = {"type": "something"}
type_documents = db['cluster'].find(filter)
with open("type_documents.json", "w") as file:
file.write('[')
for i, document in enumerate(type_documents, 1):
if i != 1:
file.write(',')
file.write(json.dumps(document, default=str))
file.write(']')
Upvotes: 1
Reputation: 306
Using pymongo's json_util
:
from bson.json_util import dumps
from pymongo import MongoClient
import json
db_client = MongoClient(mongo_connection_string)
collections = db.collection_name
for collectio in collections.find():
with open("collection.json", 'w') as file:
op_json = dumps(operation)
json.dump(op_json, file)
Upvotes: 0
Reputation: 1432
The accepted solution produces an invalid JSON. It results in trailing comma ,
before the close square bracket ]
. The JSON spec does not allow trailing commas. See this answer and this reference.
To build on the accepted solution I used the following:
from bson.json_util import dumps
from pymongo import MongoClient
import json
if __name__ == '__main__':
client = MongoClient()
db = client.db_name
collection = db.collection_name
cursor = collection.find({})
with open('collection.json', 'w') as file:
json.dump(json.loads(dumps(cursor)), file)
Upvotes: 20
Reputation: 351
"""
@Author: Aseem Jain
@profile: https://www.linkedin.com/in/premaseem/
"""
import os
import pymongo
# configure credentials / db name
db_user = os.environ["MONGO_ATLAS_USER"]
db_pass = os.environ["MONGO_ATLAS_PASSWORD"]
db_name = "sample_mflix"
connection_string = f"mongodb+srv://{db_user}:{db_pass}@sharedcluster.lv3wx.mongodb.net/{db_name}?retryWrites=true&w=majority"
client = pymongo.MongoClient(connection_string)
db = client[db_name]
# create database back directory with db_name
os.makedirs(db_name, exist_ok=True)
# list all tables in database
tables = db.list_collection_names()
# dump all tables in db
for table in tables:
print("exporting data for table", table )
data = list(db[table].find())
# write data in json file
with open(f"{db.name}/{table}.json","w") as writer:
writer.write(str(data))
exit(0)
Upvotes: 1
Reputation: 472
Just get all documents and save them to file e.g.:
from bson.json_util import dumps
from pymongo import MongoClient
if __name__ == '__main__':
client = MongoClient()
db = client.db_name
collection = db.collection_name
cursor = collection.find({})
with open('collection.json', 'w') as file:
file.write('[')
for document in cursor:
file.write(dumps(document))
file.write(',')
file.write(']')
Upvotes: 15
Reputation: 155
Complementing @kamilitw I use length of cursor to make a JSON file correctly. I use count()
and if-else
:
def writeToJSONFile(collection):
cursor = collection.find({})
file = open("collection.json", "w")
file.write('[')
qnt_cursor = 0
for document in cursor:
qnt_cursor += 1
num_max = cursor.count()
if (num_max == 1):
file.write(json.dumps(document, indent=4, default=json_util.default))
elif (num_max >= 1 and qnt_cursor <= num_max-1):
file.write(json.dumps(document, indent=4, default=json_util.default))
file.write(',')
elif (qnt_cursor == num_max):
file.write(json.dumps(document, indent=4, default=json_util.default))
file.write(']')
return file
So the JSON file will be correct in the and, because before as writing like this: [{"test": "test"},]
, now it's writing: [{"test":"test1"},{"test":"test2"}]
Upvotes: 1
Reputation: 91
Here's another way of not saving a ,
before the closing square brackets. Also using with open
to save some space.
filter = {"type": "something"}
type_documents = db['cluster'].find(filter)
type_documents_count = db['cluster'].count_documents(filter)
with open("type_documents.json", "w") as file:
file.write('[')
# Start from one as type_documents_count also starts from 1.
for i, document in enumerate(type_documents, 1):
file.write(json.dumps(document, default=str))
if i != type_documents_count:
file.write(',')
file.write(']')
It basically doesn't write the comma if number of iterations is equal to the number of documents (which is the last document it saves).
Upvotes: 2