A.V. Arno
A.V. Arno

Reputation: 563

How can I flatten double arrays in mongoDB?

Some fields in my mongoDB documents look like this:

{
...
Countries: [["Spain", "France"]]
...
}

Or this:

{
...
Countries: [["Spain"],["Russia", "Egypt"]]
...
}

What I want to do is to turn [["Spain", "France"]] into ["Spain", "France"] and [["Spain"],["Russia", "Egypt"]] into ["Spain", "Russia", "Egypt"], similar to using the flatten method in Ruby.

Is there a way to flatten arrays in mongoDB? I need to flatten arrays in all documents in entire collection, not just a single doc, if that matters, also, the values and their quantity in arrays varies between documents.

I am also using Ruby as a driver for mongo, so a method using a Ruby driver would be also useful to me.

Upvotes: 14

Views: 13123

Answers (5)

Vincent Couturier
Vincent Couturier

Reputation: 526

since the version 5 of }, we have an alternative using the $function operator like this:

db.collection.aggregate([
   {
      $addFields: { 
        Countries: {
          $function: {
            body: function(items) {
              return [].concat(...items)
            },
            args: [ "$Countries" ],
            lang: "js"
          }
       }
     }
   }
])

the ideal solution would be to be able to use $concatArrays but unfortunately this operator is not able to take into account an array field (at the time of this writing). There's a ticket in the mongodb backlog about this limitation on array fields : https://jira.mongodb.org/browse/SERVER-31991 . If you are interested by the flexibility brought by this enhancement please vote. This ticket has been here since 2019 so it seems to need help to be prioritized ;-)

Upvotes: 0

Matt Lo
Matt Lo

Reputation: 5741

In Mongo 3.4+ you can use $reduce to flatten 2d arrays.

db.collection.aggregate(
  [
    {
      $project: {
        "countries": {
          $reduce: {
            input: '$Countries',
            initialValue: [],
            in: {$concatArrays: ['$$value', '$$this']}
          }
        }
      }
    }
  ]
)

Docs: https://docs.mongodb.com/manual/reference/operator/aggregation/reduce/

Upvotes: 24

BatScream
BatScream

Reputation: 19700

You need perform an aggregation operation with two unwind stages and a single group stage. The basic rule being you unwind as many times as the level of nest depth. Here the level of nesting is 2, so we unwind two times.

 collection.aggregate([
 {$unwind => "$Countries"},
 {$unwind => "$Countries"},
 {$group => {"_id":"$_id","Countries":{$push => "$Countries"}}}
 ])

The first $unwind stage produces the result:

{
        "_id" : ObjectId("54a32e0fc2eaf05fc77a5ea4"),
        "Countries" : [
                "Spain",
                "France"
        ]
}
{
        "_id" : ObjectId("54a32e4ec2eaf05fc77a5ea5"),
        "Countries" : [
                "Spain"
        ]
}
{
        "_id" : ObjectId("54a32e4ec2eaf05fc77a5ea5"),
        "Countries" : [
                "Russia",
                "Egypt"
        ]
}

The second $unwind stage further flattens the Countries array:

{ "_id" : ObjectId("54a32e0fc2eaf05fc77a5ea4"), "Countries" : "Spain" }
{ "_id" : ObjectId("54a32e0fc2eaf05fc77a5ea4"), "Countries" : "France" }
{ "_id" : ObjectId("54a32e4ec2eaf05fc77a5ea5"), "Countries" : "Spain" }
{ "_id" : ObjectId("54a32e4ec2eaf05fc77a5ea5"), "Countries" : "Russia" }
{ "_id" : ObjectId("54a32e4ec2eaf05fc77a5ea5"), "Countries" : "Egypt" }

Now the final $group stage groups the records based on the _id,and accumulates the country names in a single array.

{
        "_id" : ObjectId("54a32e4ec2eaf05fc77a5ea5"),
        "Countries" : [
                "Spain",
                "Russia",
                "Egypt"
        ]
}
{
        "_id" : ObjectId("54a32e0fc2eaf05fc77a5ea4"),
        "Countries" : [
                "Spain",
                "France"
        ]
}

If you wish to keep other fields in the document then you need to explicitly specify the names of the fields other than the country field,(field1,field2,etc..), using the $first operator. You can write/overwrite a collection by specifying the name of the collection in the $out stage.

collection.aggregate([
 {$unwind => "$Countries"},
 {$unwind => "$Countries"},
 {$group => {"_id":"$_id","Countries":{$push => "$Countries"},
             "field1":{$first => "$field1"}}},
 {$out => "collection"}
 ])

You need to explicitly specify the fields so that you don't get a redundant Countries field.

You can use the $$ROOT system variable to store the entire document, but that would make the Countries field redundant.One outside the doc and one inside the doc.

collection.aggregate([
 {$unwind => "$Countries"},
 {$unwind => "$Countries"},
 {$group => {"_id":"$_id","Countries":{$push => "$Countries"},
             "doc":{$first => "$$ROOT"}}},
 {$out => "collection"}
 ])

Upvotes: 9

anhlc
anhlc

Reputation: 14469

Your data for Countries are not in a good format, so you may consider to convert them. This is a script to flatten the array in Countries field and save it the origin documents that you can run in a mongo shell:

function flattenArray(inArr) {
    var ret = [];
    inArr.forEach(function(arr) {
        if (arr.constructor.toString().indexOf("Array") > -1) {
           ret = ret.concat(flattenArray(arr));
        } else {
           ret.push(arr);                   
        }
    });
    return ret;
}


db.collection.find({
  'Countries': {
    '$exists': true
  }
}).forEach(function(doc){
  doc.Countries = flattenArray(doc.Countries);
  db.collection.save(doc);
});

Upvotes: 4

Disposer
Disposer

Reputation: 6371

Try this:

db.test2.aggregate([
   {"$unwind" : "$Countries"},
   {"$unwind" : "$Countries"},
   {$group : { _id : '$_id', Countries: { $addToSet: "$Countries" }}},
]).result

Upvotes: 4

Related Questions