user2793120
user2793120

Reputation: 393

How can I get all the doc ids in MongoDB?

How can I get an array of all the doc ids in MongoDB? I only need a set of ids but not the doc contents.

Upvotes: 30

Views: 60296

Answers (9)

Alexei Khlebnikov
Alexei Khlebnikov

Reputation: 2395

Solutions for Kotlin and Spring Data.

A trivial solution that fetches all the Documents and gets the IDs from them:

// Defining the Repository.
@Repository
interface MyRepository : MongoRepository<MyDocument, String> {
}

// Fetching the IDs.
val ids: List<String> = myRepository.findAll().mapNotNull { it._id }

An optimized solution that fetches only the IDs, using the MongoDB Aggregation feature:

// Defining the Repository.
@Repository
interface MyRepository : MongoRepository<MyDocument, String> {
    @Aggregation(pipeline = [
        "{ '\$project': { '_id': 1 } }"
    ])
    fun findAllIds(): List<String>
}

// Fetching the IDs.
val ids: List<String> = myRepository.findAllIds()

Upvotes: 0

user2959589
user2959589

Reputation: 617

I struggled with this for a long time, and I'm answering this because I've got an important hint. It seemed obvious that:

db.c.find({},{_id:1});

would be the answer.

It worked, sort of. It would find the first 101 documents and then the application would pause. I didn't let it keep going. This was both in Java using MongoOperations and also on the Mongo command line.

I looked at the mongo logs and saw it's doing a colscan, on a big collection of big documents. I thought, crazy, I'm projecting the _id which is always indexed so why would it attempt a colscan?

I have no idea why it would do that, but the solution is simple:

db.c.find({},{_id:1}).hint({_id:1});

or in Java:

query.withHint("{_id:1}");

Then it was able to proceed along as normal, using stream style:

createStreamFromIterator(mongoOperations.stream(query, MortgageDocument.class)).
     map(MortgageDocument::getId).forEach(transformer);

Mongo can do some good things and it can also get stuck in really confusing ways. At least that's my experience so far.

Upvotes: 5

ThoughtFool
ThoughtFool

Reputation: 21

One of the above examples worked for me, with a minor tweak. I left out the second object, as I tried using with my Mongoose schema.

const idArray = await Model.distinct('_id', {}, function (err, result) {
    // result is your array of ids
    return result;
});

Upvotes: 0

Dayron Alfaro
Dayron Alfaro

Reputation: 11

Try with an agregation pipeline, like this:

db.collection.aggregate([
{ $match: { deletedAt: null }},
{ $group: { _id: "$_id"}}

])

this gona return a documents array with this structure

_id: ObjectId("5fc98977fda32e3458c97edd")

Upvotes: 1

Shashi Rivankar
Shashi Rivankar

Reputation: 1

i had a similar requirement to get ids for a collection with 50+ million rows. I tried many ways. Fastest way to get the ids turned out to be to do mongoexport with just the ids.

Upvotes: 0

JohnnyHK
JohnnyHK

Reputation: 311865

You can do this in the Mongo shell by calling map on the cursor like this:

var a = db.c.find({}, {_id:1}).map(function(item){ return item._id; })

The result is that a is an array of just the _id values.

The way it works in Node is similar.

(This is MongoDB Node driver v2.2, and Node v6.7.0)

db.collection('...')
  .find(...)
  .project( {_id: 1} )
  .map(x => x._id)
  .toArray();

Remember to put map before toArray as this map is NOT the JavaScript map function, but it is the one provided by MongoDB and it runs within the database before the cursor is returned.

Upvotes: 65

Lucio Mollinedo
Lucio Mollinedo

Reputation: 2424

I also was wondering how to do this with the MongoDB Node.JS driver, like @user2793120. Someone else said he should iterate through the results with .each which seemed highly inefficient to me. I used MongoDB's aggregation instead:

    myCollection.aggregate([
            {$match: {ANY SEARCHING CRITERIA FOLLOWING $match'S RULES} },
            {$sort: {ANY SORTING CRITERIA, FOLLOWING $sort'S RULES}},
            {$group: {_id:null, ids: {$addToSet: "$_id"}}}
    ]).exec()

The sorting phase is optional. The match one as well if you want all the collection's _ids. If you console.log the result, you'd see something like:

    [ { _id: null, ids: [ '56e05a832f3caaf218b57a90', '56e05a832f3caaf218b57a91', '56e05a832f3caaf218b57a92' ] } ]

Then just use the contents of result[0].ids somewhere else.

The key part here is the $group section. You must define a value of null for _id (otherwise, the aggregation will crash), and create a new array field with all the _ids. If you don't mind having duplicated ids (according to your search criteria used in the $match phase, and assuming you are grouping a field other than _id which also has another document _id), you can use $push instead of $addToSet.

Upvotes: 6

whitfin
whitfin

Reputation: 4629

One way is to simply use the runCommand API.

db.runCommand ( { distinct: "distinct", key: "_id" } )

which gives you something like this:

{
    "values" : [
        ObjectId("54cfcf93e2b8994c25077924"),
        ObjectId("54d672d819f899c704b21ef4"),
        ObjectId("54d6732319f899c704b21ef5"),
        ObjectId("54d6732319f899c704b21ef6"),
        ObjectId("54d6732319f899c704b21ef7"),
        ObjectId("54d6732319f899c704b21ef8"),
        ObjectId("54d6732319f899c704b21ef9")
    ],
    "stats" : {
        "n" : 7,
        "nscanned" : 7,
        "nscannedObjects" : 0,
        "timems" : 2,
        "cursor" : "DistinctCursor"
    },
    "ok" : 1
}

However, there's an even nicer way using the actual distinct API:

 var ids = db.distinct.distinct('_id', {}, {});

which just gives you an array of ids:

[
    ObjectId("54cfcf93e2b8994c25077924"),
    ObjectId("54d672d819f899c704b21ef4"),
    ObjectId("54d6732319f899c704b21ef5"),
    ObjectId("54d6732319f899c704b21ef6"),
    ObjectId("54d6732319f899c704b21ef7"),
    ObjectId("54d6732319f899c704b21ef8"),
    ObjectId("54d6732319f899c704b21ef9")
]

Not sure about the first version, but the latter is definitely supported in the Node.js driver (which I saw you mention you wanted to use). That would look something like this:

db.collection('c').distinct('_id', {}, {}, function (err, result) {
    // result is your array of ids
})

Upvotes: 14

Anuj Aneja
Anuj Aneja

Reputation: 1344

Another way to do this on mongo console could be:

var arr=[]
db.c.find({},{_id:1}).forEach(function(doc){arr.push(doc._id)})
printjson(arr)

Hope that helps!!!

Thanks!!!

Upvotes: 5

Related Questions