The speed of mongoimport while using -jsonArray is very slow

Question

I have a 15GB file with more than 25 milion rows, which is in this json format(which is accepted by mongodb for importing:

[
    {"_id": 1, "value": "\u041c\..."}
    {"_id": 2, "value": "\u041d\..."}
    ...
]

When I'm trying to import it in mongodb with the following command I get speed of only 50 rows per second which is really slow for me.

mongoimport --db wordbase --collection sentences --type json --file C:\Users\Aleksandar\PycharmProjects\NLPSeminarska\my_file.json -jsonArray

When I tried to insert the data into the collection by using python with pymongo the speed was even worse. I also tried increasing the priority of the process but it didn't make any difference.

The next thing that I tried is the same thing but without using -jsonArray and although I got a big speed increase(~4000/sec), it said that the BSON representation of the supplied JSON is too large.

I also tried splitting the file into 5 separate files and importing them from separate consoles into the same collection, but I get speed decrease of all of them to about 20 documents/sec.

While I searched all over the web I saw that people had speeds of over 8K documents/sec and I can't see what do I do wrong.

Is there a way to speed this thing up, or should I convert the whole json file to bson and import it that way, and if so which is the correct way to do both the converting and the importing?

Huge thanks.

The speed of mongoimport while using -jsonArray is very slow

Answers (1)

Related Questions