Reputation: 71
I need to do bulk-insert of document in my CouchDB database. I'm trying to follow the manual here: http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API
Here is my script:
~$ DB="http://localhost:5984/employees"
~$ curl -H "Content-Type:application/json" -d @employees_selfContained.json -vX POST $DB/_bulk_docs
the file employees_selfContained.json is a huge file = 465 MB. I've validated it using JSONLint and found nothing wrong.
Here's the curl's verbose output:
* About to connect() to 127.0.0.1 port 5984 (#0)
* Trying 127.0.0.1... connected
* Connected to 127.0.0.1 (127.0.0.1) port 5984 (#0)
> POST /employees/_bulk_docs HTTP/1.1
> User-Agent: curl/7.19.7 (i486-pc-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8k zlib/1.2.3.3 libidn/1.15
> Host: 127.0.0.1:5984
> Accept: */*
> Content-Type:application/json
> Content-Length: 439203931
> Expect: 100-continue
>
< HTTP/1.1 100 Continue
* Empty reply from server
* Connection #0 to host 127.0.0.1 left intact
curl: (52) Empty reply from server
* Closing connection #0
How can i do bulk-insert from that Huge single file? I prefer not to split the file into smaller size if possible..
EDIT: In case of someone wondering, I'm trying to convert this schema : http://dev.mysql.com/doc/employee/en/sakila-structure.html Into Self-contained document database, with structure like this:
{
"docs": [
{
"emp_no": ..,
"birth_date": ..,
"first_name": ..,
"last_name" : ..,
"gender": ..,
"hire_date": ..,
"titles":
[
{
"title": ..,
"from_date": ..,
"to_date": ..
},
{..}
],
"salaries" :
[
{
"salary": ..,
"from_date": ..,
"to_date": ..
},
{..}
],
"dept_emp":
[
{
"dept_no": ..,
"from_date": ..,
"to_date":
},
{..}
],
"dept_manager":
[
{
"dept_no": ..,
"from_date": ..,
"to_date": ..
},
{..}
],
"departments":
[
{
"dept_no": ..,
"dept_name": ..
},
{..}
]
} ,
.
.
{..}
]
}
Upvotes: 3
Views: 5876
Reputation: 2659
Loop over the JSON and insert in batches of 10-50k documents.
Upvotes: 1