Amanda
Amanda

Reputation: 2163

Counting words in the array - How can I query with mongo?

"sourceList": [
        {
           "source" : "hello world, how are you?",
           "_id" : ObjectId("5f0eb9946db57c0007841153")
        },
        {
           "source" : "hello world, I am fine",
           "_id" : ObjectId("5f0eb9946db57c0007841153")
        },
        {
           "source" : "Is it raining?",
           "_id" : ObjectId("5f0eb9946db57c0007841153")
        }
]

Total words in hello world, how are you? = 5, in hello world, I am fine = 5, and in Is it raining?= 3.

Thus the total number of words = 13

Is there a mongo query to do this calculation? I could do this using javascript, but is there a direct way to query via mongo?

EDIT


Is there a way I can do this query across the documents? For documents obeying specific criteria, I want to run a similar calculation with an added constraint, that words of duplicate sentences are not counted twice. For example,

Document - 1

"sourceList": [
    {
       "source" : "hello world, how are you?",
       "_id" : ObjectId("5f0eb9946db57c0007841153")
    },
    {
       "source" : "hello world, I am fine",
       "_id" : ObjectId("5f0eb9946db57c0007841153")
    },
    {
       "source" : "Is it raining?",
       "_id" : ObjectId("5f0eb9946db57c0007841153")
    }
]

Document - 2

  "sourceList": [
    {
       "source" : "hello world, how are you?",
       "_id" : ObjectId("5f0eb9946db57c0007841153")
    },
    {
       "source" : "hello world, I am fine",
       "_id" : ObjectId("5f0eb9946db57c0007841153")
    },
    {
       "source" : "Is it raining?",
       "_id" : ObjectId("5f0eb9946db57c0007841153")
    }
]

Here the count still remains the same. The reason being, sentences are exactly same in both the documents. But if we combine Document 1 + Document 3 (given as follows)

"sourceList": [
    {
       "source" : "Look at the beautiful tiger!",
       "_id" : ObjectId("5f0eb9946db57c0007841153")
    }
]

The count would come as 13 + 5 (document 3) = 18.

Upvotes: 0

Views: 391

Answers (1)

Gibbs
Gibbs

Reputation: 22974

Yes, You can do that with the help of powerful aggregate framework.

mongo play-ground

db.collection.aggregate([
  {
    "$unwind": "$sourceList" //For each array element
  },
  {
    $project: {
      "sp": {
        $split: [
          "$sourceList.source", //split by spaces
          " "
        ]
      }
    }
  },
  {
    "$project": {
      "sizes": {
        "$size": "$sp". //count the words in each array
      }
    }
  },
  {
    "$group": {
      "_id": "$_id",
      "count": {
        "$sum": "$sizes" //group by id to reverse unwind and add the sizes
      }
    }
  }
])

Update:

play

db.collection.aggregate([
  {
    "$unwind": "$sourceList"
  },
  {
    $project: {
      "sp": {
        $split: [
          "$sourceList.source",
          " "
        ]
      }
    }
  },
  {
    "$project": {
      "sizes": {
        "$size": "$sp"
      }
    }
  },
  {
    "$group": {
      "_id": null,
      "count": {
        "$sum": "$sizes"
      }
    }
  }
])

For huge collections, you may need to use allowDiskUse but it is very heavy operation for larger collections.

Update:

play

db.collection.aggregate([
  {
    "$unwind": "$sourceList"
  },
  {
    $project: {
      "sp": {
        $split: [
          "$sourceList.source",
          " "
        ]
      }
    }
  },
  {
    "$group": {
      "_id": null,
      "elements": {
        $addToSet: "$sp"
      }
    }
  },
  {
    "$unwind": "$elements"
  },
  {
    "$project": {
      "sizes": {
        "$size": "$elements"
      }
    }
  },
  {
    "$group": {
      "_id": null,
      "count": {
        "$sum": "$sizes"
      }
    }
  }
])

Upvotes: 2

Related Questions