Gyan Prakash
Gyan Prakash

Reputation: 121

Export pymongo collection to a JSON file

How to export pymongo collection to a JSON file, I have a huge collection which has approximately around 1 GB of data, I need an efficient way to export the data and create a JSON file out of it. I am using the below code, as written in one of the stack overflow answers.

def getJSONFromDB():
    db = GetMongo_client()
    collection = db['collection_name']
    cursor = collection.find({})
    file = open("collection.json", "w")
    file.write('[')
    for document in cursor:
      file.write(json.dumps(document))
      file.write(',')
    file.write(']') 

But it gives me the following error: TypeError: Object of type ObjectId is not JSON serializable

Upvotes: 3

Views: 2273

Answers (2)

Robert A
Robert A

Reputation: 463

Hi it's end to end solution

Step 1: it's export all needs collections to json file

Step 2: it's import this json files to other db

import pymongo
import os
import json
from bson.json_util import dumps
import sys
import Consts


def replace_word(infile,old_word,new_word):
    if not os.path.isfile(infile):
        print ("Error on replace_word, not a regular file: "+infile)
        sys.exit(1)
    f1=open(infile,'r').read()
    f2=open(infile,'w')
    m=f1.replace(old_word,new_word)
    f2.write(m)

def replase_coletion(db_name_list,client_prod,client_stg):
    for db_name in db_name_list:
        db_prod = client_prod[db_name]
        os.makedirs(db_name, exist_ok=True)
        tables_prod = db_prod.list_collection_names()
        print(tables_prod)
        for table in tables_prod:
            print("exporting data for table", table )
            data = list(db_prod[table].find())
            json_data = dumps(data, indent=4)
            # write data in json file
            with open(f"{db_prod.name}/{table}.json", 'w') as file:
                file.write(json_data)
                replace_word(db_prod.name+"/"+table+".json","$oid","oid")
        db_stg = client_stg[db_name]
        tables_stg = db_stg.list_collection_names()
        print(tables_stg)
        for table in tables_stg:
           db_stg.drop_collection(table)
        for table in tables_prod:
            try:
                with open(f"{db_prod.name}/{table}.json") as read_file:
                    data = json.load(read_file)
                print(data)
                client_stg[db_name][table].insert_many(data)
            except:
                print("empty list")
        db_stg = client_stg[db_name]
        tables_stg = db_stg.list_collection_names()
        print(tables_stg)

if __name__ == "__main__":
    db_name_list = Consts.db_lists
    client_prod = pymongo.MongoClient(Consts.from_bd)
    client_stg = pymongo.MongoClient(Consts.to_db)
    replase_coletion(db_name_list, client_prod, client_stg)

Upvotes: 0

Shritej Thorve
Shritej Thorve

Reputation: 27

The pymongo documentation you pointed is obsolete. If you're using version 1.7 I recommend updating. With a more recent version you can do this:

from bson.json_util import dumps

dumps(l)

https://pymongo.readthedocs.io/en/stable/api/bson/json_util.html

Side answer: u'name', u'date', u'_id' etc are the names of the fields of the document on the database.

Upvotes: 1

Related Questions