Bas
Bas

Reputation: 337

MongoDB aggregation group by similar string

Im starting to learn aggregations for Mongo, but for my project i found a lot of brands in my collection with very similar names, like 'BrandA' and 'BrandA tech'. Is there a way to group them at the end of my aggregation?

I have 2 collections in my database:

The first one is for brands:

{
  _id: ObjectId(),
  name: String
}

The second one is for products:

{
  _id: ObjectId(),
  name: String,
  brand: ObjectId() // referring to _id of brands
}

Now lets say i have the following brands:

{_id: ObjectId('5a9fd2b8045b020013de2a47'), name: 'brand1'},
{_id: ObjectId('5a9fcf94d28420245451a39c'), name: 'brand2'},
{_id: ObjectId('5a9fcf94d28420245451a39a'), name: 'brand1 sub1'},
{_id: ObjectId('5a9fe8bf045b020013de2a6d'), name: 'sub2 brand2'}

And the following products:

{_id: ObjectId(''), name: 'item1', brand: ObjectId('5a9fd2b8045b020013de2a47')},
{_id: ObjectId(''), name: 'item2', brand: ObjectId('5a9fcf94d28420245451a39c')},
{_id: ObjectId(''), name: 'item3', brand: ObjectId('5a9fd2b8045b020013de2a47')},
{_id: ObjectId(''), name: 'item4', brand: ObjectId('5a9fcf94d28420245451a39a')},
{_id: ObjectId(''), name: 'item5', brand: ObjectId('5a9fe8bf045b020013de2a6d')},
{_id: ObjectId(''), name: 'item6', brand: ObjectId('5a9fd2b8045b020013de2a47')},
{_id: ObjectId(''), name: 'item7', brand: ObjectId('5a9fcf94d28420245451a39c')},
{_id: ObjectId(''), name: 'item8', brand: ObjectId('5a9fcf94d28420245451a39a')}

The query I have now:

db.getCollection('products').aggregate([
  {$group: {
    _id: '$brand',
    amount: { $sum: 1 },
  }},
  {
    $sort: { 'amount': -1 }
  },{$lookup: {
    from: 'brands',
    localField: '_id',
    foreignField: '_id',
    as: 'lookup'
  }},
  {$unwind: {path: '$lookup'}},
  {$project: {
    _id: '$_id',
    brandName: '$lookup.name',
    amount: '$amount'
  }}
]);

Result:

{_id: ObjectId('5a9fd2b8045b020013de2a47'), brandName: 'brand1', amount: 3}
{_id: ObjectId('5a9fcf94d28420245451a39c'), brandName: 'brand2', amount: 2}
{_id: ObjectId('5a9fcf94d28420245451a39a'), brandName: 'brand1 sub1', amount: 2}
{_id: ObjectId('5a9fe8bf045b020013de2a6d'), brandName: 'sub2 brand2', amount: 1}

Result I want:

{_id: ObjectId(null), brandName: 'brand1', amount: 5},
{_id: ObjectId(null), brandName: 'brand2', amount: 3}

Is it possible to to group the result I have now by finding similar strings in brandName? Like grouping 'brand1' and 'brand1 sub1' or 'brand2' and 'sub2 brand2'?

Upvotes: 0

Views: 1925

Answers (2)

Lars Hendriks
Lars Hendriks

Reputation: 1058

a change in the model could easily achieve this. just add the items in an array to a brand. then you instantly get a count by using the array's length and the query speed is faster.

Upvotes: 0

Michael Dussere
Michael Dussere

Reputation: 498

I think that you could do what you want by using $split and $unwind

split will transform your string into an array of words and unwind will create as many entries as you have words in the array.

Then you can apply the pipeline you already prepared to count the occurences.

Upvotes: 1

Related Questions