Michael
Michael

Reputation: 460

read values for a mongodb query from file

I'm trying to query all documents from a mongodb collection of which the criteria is inside a file.

criteria-file.txt:

value1
value2
value3
...

Currently I'm building the query like this

built-test.js.sh:

#!/bin/bash
echo 'db.collection.find({keyfield: {$in:[' > test.js
cat criteria-file.txt| while read i
do
    echo "\"$i\"," >> test.js
done
echo ']}})' >> test.js

The query document is way under 16MB in size, but I wonder if there's a better way which is more elegant and efficient, especially, because over time I will most probably be over 16MB for the query document. I'm eager to get your suggestions.

BTW, I was wondering, for those 25K criteria values seeking in a collection with currently 200 million entries, the query time is only a bit over a minute and the CPU load doesn't seem to be too bad.

Thanks!

Upvotes: 0

Views: 2103

Answers (2)

prasad_
prasad_

Reputation: 14287

Read the file into an array using the cat() native shell method. Then, loop over the array of criteria values to find the matching documents and store all the documents in an array; this will be your list of matches.

var criteria_file = cat("criteria-file.txt");
var criteria_array = criteria_file.split("\n");

var result_ids_arr = [ ];

for (let value of criteria_array) {

    let id_arr = db.collection.find( { keyfield: value }, { _id: 1} ).toArray();
    result_ids_arr = result_ids_arr.concat(id_arr);
}

The result array of the _id values, e.g.: [ { "_id" : 11 }, { "_id" : 34 }, ... ]

All this JavaScript can be run from the command prompt or the mongo shell, using the load().

Upvotes: 1

Joe Drumgoole
Joe Drumgoole

Reputation: 1348

Split the criteria file into different chunks ensuring that the chunks don't exceed 16MB.

Run the same query on each chunk only now you can run the queries in parallel.

if you want to get extra fancy you can use the aggregation pipeline to do a $match query and send all the output results from each query to a single results collection using $mergeenter link description here.

Upvotes: 1

Related Questions