Jared
Jared

Reputation: 3892

Error Handling on mongoimport

I have a directory of roughly 45,000 json files. The total size is around 12.8 GB currently. This is website data from kissmetrics and its structure is detailed here.

The data: Each file is multiple json documents separated by a newline It will be updated every 12 hours with new additional files

I want to import this data to mongoDB using mongoimport. I've tried this shell script to make the process easier:

for filename in revisions/*;

do

echo $filename
mongoimport --host <HOSTNAME>:<PORT> --db <DBNAME> --collection <COLLECTIONNAME> \
    --ssl --sslCAFile ~/mongodb.pem --username <USERNAME> --password <PASSWORD> \
    --authenticationDatabase admin $filename

done

This will have errors

2016-06-18T00:31:10.781+0000    using 1 decoding workers
2016-06-18T00:31:10.781+0000    using 1 insert workers
2016-06-18T00:31:10.781+0000    filesize: 113 bytes
2016-06-18T00:31:10.781+0000    using fields:
2016-06-18T00:31:10.822+0000    connected to: <HOSTNAME>:<PORT>
2016-06-18T00:31:10.822+0000    ns: <DBNAME>.<COLLECTION>
2016-06-18T00:31:10.822+0000    connected to node type: standalone
2016-06-18T00:31:10.822+0000    standalone server: setting write concern w to 1
2016-06-18T00:31:10.822+0000    using write concern: w='1', j=false, fsync=false, wtimeout=0
2016-06-18T00:31:10.822+0000    standalone server: setting write concern w to 1
2016-06-18T00:31:10.822+0000    using write concern: w='1', j=false, fsync=false, wtimeout=0
2016-06-18T00:31:10.824+0000    Failed: error processing document #1: invalid character 'l' looking for beginning of value
2016-06-18T00:31:10.824+0000    imported 0 documents

I will potentially run into this error, and from my inspection is not due to malformed data.

The error may happen hours into the import.

Can I parse the error in mongoimport to retry the same document? I don't know if the error will have this same form, so I'm not sure if I can try to handle it in bash. Can I keep track of progress in bash and restart if terminated early? Any suggestions on importing large data of this size or handling the error in shell?

Upvotes: 3

Views: 1653

Answers (1)

Tom Harrison
Tom Harrison

Reputation: 14068

Typically a given command will return error codes when it fails (and the are hopefully documented on the man page for the command).

So if you want to do something hacky and just retry once,

cmd="mongoimport --foo --bar..."
$cmd
ret=$?
if [ $ret -ne 0 ]; then
  echo "retrying..."
  $cmd
  if [ $? -ne 0 ]; then
    "failed again.  Sadness."
    exit
  fi
fi

Or if you really need what mongoimport outputs, capture it like this

results=`mongoimport --foo --bar...`

Now the variable $results will contain what was returned on stdout. Might have to redirect stderr as well.

Upvotes: 1

Related Questions