Reputation: 121
How to export pymongo collection to a JSON file, I have a huge collection which has approximately around 1 GB of data, I need an efficient way to export the data and create a JSON file out of it. I am using the below code, as written in one of the stack overflow answers.
def getJSONFromDB():
db = GetMongo_client()
collection = db['collection_name']
cursor = collection.find({})
file = open("collection.json", "w")
file.write('[')
for document in cursor:
file.write(json.dumps(document))
file.write(',')
file.write(']')
But it gives me the following error: TypeError: Object of type ObjectId is not JSON serializable
Upvotes: 3
Views: 2273
Reputation: 463
Hi it's end to end solution
Step 1: it's export all needs collections to json file
Step 2: it's import this json files to other db
import pymongo
import os
import json
from bson.json_util import dumps
import sys
import Consts
def replace_word(infile,old_word,new_word):
if not os.path.isfile(infile):
print ("Error on replace_word, not a regular file: "+infile)
sys.exit(1)
f1=open(infile,'r').read()
f2=open(infile,'w')
m=f1.replace(old_word,new_word)
f2.write(m)
def replase_coletion(db_name_list,client_prod,client_stg):
for db_name in db_name_list:
db_prod = client_prod[db_name]
os.makedirs(db_name, exist_ok=True)
tables_prod = db_prod.list_collection_names()
print(tables_prod)
for table in tables_prod:
print("exporting data for table", table )
data = list(db_prod[table].find())
json_data = dumps(data, indent=4)
# write data in json file
with open(f"{db_prod.name}/{table}.json", 'w') as file:
file.write(json_data)
replace_word(db_prod.name+"/"+table+".json","$oid","oid")
db_stg = client_stg[db_name]
tables_stg = db_stg.list_collection_names()
print(tables_stg)
for table in tables_stg:
db_stg.drop_collection(table)
for table in tables_prod:
try:
with open(f"{db_prod.name}/{table}.json") as read_file:
data = json.load(read_file)
print(data)
client_stg[db_name][table].insert_many(data)
except:
print("empty list")
db_stg = client_stg[db_name]
tables_stg = db_stg.list_collection_names()
print(tables_stg)
if __name__ == "__main__":
db_name_list = Consts.db_lists
client_prod = pymongo.MongoClient(Consts.from_bd)
client_stg = pymongo.MongoClient(Consts.to_db)
replase_coletion(db_name_list, client_prod, client_stg)
Upvotes: 0
Reputation: 27
The pymongo documentation you pointed is obsolete. If you're using version 1.7 I recommend updating. With a more recent version you can do this:
from bson.json_util import dumps
dumps(l)
https://pymongo.readthedocs.io/en/stable/api/bson/json_util.html
Side answer: u'name', u'date', u'_id' etc are the names of the fields of the document on the database.
Upvotes: 1