Reputation: 3892
I have a directory of roughly 45,000 json files. The total size is around 12.8 GB currently. This is website data from kissmetrics and its structure is detailed here.
The data: Each file is multiple json documents separated by a newline It will be updated every 12 hours with new additional files
I want to import this data to mongoDB using mongoimport. I've tried this shell script to make the process easier:
for filename in revisions/*;
do
echo $filename
mongoimport --host <HOSTNAME>:<PORT> --db <DBNAME> --collection <COLLECTIONNAME> \
--ssl --sslCAFile ~/mongodb.pem --username <USERNAME> --password <PASSWORD> \
--authenticationDatabase admin $filename
done
This will have errors
2016-06-18T00:31:10.781+0000 using 1 decoding workers
2016-06-18T00:31:10.781+0000 using 1 insert workers
2016-06-18T00:31:10.781+0000 filesize: 113 bytes
2016-06-18T00:31:10.781+0000 using fields:
2016-06-18T00:31:10.822+0000 connected to: <HOSTNAME>:<PORT>
2016-06-18T00:31:10.822+0000 ns: <DBNAME>.<COLLECTION>
2016-06-18T00:31:10.822+0000 connected to node type: standalone
2016-06-18T00:31:10.822+0000 standalone server: setting write concern w to 1
2016-06-18T00:31:10.822+0000 using write concern: w='1', j=false, fsync=false, wtimeout=0
2016-06-18T00:31:10.822+0000 standalone server: setting write concern w to 1
2016-06-18T00:31:10.822+0000 using write concern: w='1', j=false, fsync=false, wtimeout=0
2016-06-18T00:31:10.824+0000 Failed: error processing document #1: invalid character 'l' looking for beginning of value
2016-06-18T00:31:10.824+0000 imported 0 documents
I will potentially run into this error, and from my inspection is not due to malformed data.
The error may happen hours into the import.
Can I parse the error in mongoimport to retry the same document? I don't know if the error will have this same form, so I'm not sure if I can try to handle it in bash. Can I keep track of progress in bash and restart if terminated early? Any suggestions on importing large data of this size or handling the error in shell?
Upvotes: 3
Views: 1653
Reputation: 14068
Typically a given command will return error codes when it fails (and the are hopefully documented on the man
page for the command).
So if you want to do something hacky and just retry once,
cmd="mongoimport --foo --bar..."
$cmd
ret=$?
if [ $ret -ne 0 ]; then
echo "retrying..."
$cmd
if [ $? -ne 0 ]; then
"failed again. Sadness."
exit
fi
fi
Or if you really need what mongoimport
outputs, capture it like this
results=`mongoimport --foo --bar...`
Now the variable $results
will contain what was returned on stdout
. Might have to redirect stderr
as well.
Upvotes: 1