gotqn
gotqn

Reputation: 43646

Get distinct embedded documents

I have a collection with elements like this:

{
    "_id": {
        "$oid": "56d61033a378eccde8a8354f"
    },
    "id": "10021-2015-ENFO",
    "certificate_number": 9278806,
    "business_name": "ATLIXCO DELI GROCERY INC.",
    "date": "Feb 20 2015",
    "result": "No Violation Issued",
    "sector": "Cigarette Retail Dealer - 127",
    "address": {
        "city": "RIDGEWOOD",
        "zip": 11385,
        "street": "MENAHAN ST",
        "number": 1712
    }
}

What I want is to add _id for each address. This is one time operation and I am doing it for researching/testing purpose.

I think that I can split the task into the following steps:

I select the documents like like this:

db.ci.find({}, {"address":1, "_id":0});

but I fail to make a distinct from it using the Distinct function or foreach. I try to use aggregate, too but did not make it work.

Could anyone give me some tips?

I am using Ubuntu 20.04, mongodb 4.2.7, vs code with mongodb extension.

Upvotes: 0

Views: 115

Answers (1)

Joe
Joe

Reputation: 28356

You can get a unique integer assigned to each in a single aggregation by using $facet to keep the original documents while using $addToSet to get the list of unique addresses, and $indexOfArray to assign the value to each document:

db.collection.aggregate([
  {$facet: {
      docs: [{$match: {}}],
      addresses: [
        {$group: {
            _id: null,
            address: {$addToSet: "$address"}
        }}
      ]
  }},
  {$unwind: "$docs"},
  {$unwind: "$addresses"},
  {$addFields: {
      "address.id": {
        $indexOfArray: [
          "$addresses.address",
          "$docs.address"
        ]
      }
  }},
  {$replaceRoot:{newRoot:"$docs"}},
  {$out:"new_collection"}
])

Playground

If you prefer an ObjectId, you have the right idea, one aggration to output the unique addresses to a temporary collection so each is auto-assigned an _id, then a second aggregation to embed those _id values in the original documents. For this example I gathered the _id of the original document to simplify the later lookup.

db.collection.aggregate([
  {$group:{
       _id:"$address",
       ids:{$push:"$_id"}
  }},
  {$project:{
       address:"$_id",
       ids:1,
       _id:0
  }},
  {$out: "temp_address_collection"}
])

Playground

db.collection.aggregate([
  {$lookup:{
      from:"temp_address_collection",
      localField:"_id",
      foreignField:"ids",
      as: "matched"
  }},
  {$addFields:{matched:{$arrayElemAt:["$matched",0]}}},
  {$addFields:{"$address.id": "$matched._id"}},
  {$project:{matched:0}},
  {$out:"new_collection"}
])

Playground

Upvotes: 1

Related Questions