Joe
Joe

Reputation: 4254

Find common string, aggregtion, MongoDB

Doc 1:

{
  typeId: 'A1',
  name: 'EAGLE-25'
}

Doc 2:

{
  typeId: 'A1',
  name: 'EAGLE-32'
}

Doc 3:

{
  typeId: 'B1',
  name: 'FOX5'
}

Doc 4:

{
  typeId: 'B1',
  name: 'FOX15'
}

Wanted result after aggregated query:

[
  {
     typeId: 'A1',
     commonName: 'EAGLE',
     names: ['EAGLE-25', 'EAGLE-32']
  },
  {
     typeId: 'B1',
     commonName: 'FOX',
     names: ['FOX5', 'FOX15']
  }
]

Is this possible with the aggregation framework?

Upvotes: 0

Views: 200

Answers (1)

dnickless
dnickless

Reputation: 10918

Here you go:

db.getCollection('test').aggregate
(
  {
    $group:
    {
      _id:
      {
        "typeId": "$typeId",
        "commonName": { "$substrCP": [ "$name", 0, { "$indexOfCP": [ "$name", "-" ] } ] } // group by substring of "name" property 
      },
      "names": { $push: "$name" } // create the "names" array per group
    }
  },
  {
    $project:
    {
      "_id": 0, // get rid of _id field
      "typeId": "$_id.typeId", // flatten "_id.typeId" into "typeId"
      "commonName": "$_id.commonName", // flatten "_id.commonName" into "commonName"
      "names": "$names" // include "names" array the way it is
    }
  }
)

As always with MongoDB aggregations you can get an idea of what's going on by simply reducing the projection stages step by step starting from the end of the query.

EDIT:

My answer above doesn't make too much sense anymore after your latest change to your question. And I cannot think of a way to make your generic "least common denominator" query work.

But more importantly, I think your are missing something in your specification. Imagine you have the following elements in your database:

{
  typeId: 'A1',
  name: 'EAGLE-25'
}

{
  typeId: 'A1',
  name: 'EATS-26'
}

{
  typeId: 'A1',
  name: 'EVERYTHING-27'
}

What you would get with your "least common denominator" concept is this:

[
  {
     typeId: 'A1',
     commonName: 'E',
     names: ['EAGLE-25', 'EATS-26', 'EVERYTHING-27']
  }
]

It would appear that this result does not make a lot of sense anymore...?!

EDIT 2:

I had an idea which works on the assumption that you can define a maximum length of the "common prefix". We get pretty close to what you want, I think:

db.getCollection('eagle').aggregate
(
  {
    $project:
    {
      "range": {$range: [ 1, 10, 1 ]}, // adjust the '10' to match the maximum length of your "least common prefix"
      "typeId": "$typeId",
      "name": "$name"
    }
  },
  { $unwind: "$range" },
  {
    $project:
    {
      "typeId": "$typeId",
      "name": "$name",
      "commonName": { $substrCP: ["$name", 0, "$range"] } // extract the first couple of characters from the name
    }
  },
  {
    $group: { _id: {"typeId": "$typeId", "commonName": "$commonName"}, "names": { $addToSet: "$name" } }
  },
  {
    $project:
    {
      "_id": 0, // get rid of _id field
      "typeId": "$_id.typeId", // flatten "_id.typeId" into "typeId"
      "commonName": "$_id.commonName", // flatten "_id.commonName" into "commonName"
      "names": "$names" // include "names" array the way it is
    }
  }
)

Upvotes: 1

Related Questions