Reputation: 2163
"sourceList": [
{
"source" : "hello world, how are you?",
"_id" : ObjectId("5f0eb9946db57c0007841153")
},
{
"source" : "hello world, I am fine",
"_id" : ObjectId("5f0eb9946db57c0007841153")
},
{
"source" : "Is it raining?",
"_id" : ObjectId("5f0eb9946db57c0007841153")
}
]
Total words in hello world, how are you?
= 5, in hello world, I am fine
= 5, and in Is it raining?
= 3.
Thus the total number of words = 13
Is there a mongo query to do this calculation? I could do this using javascript, but is there a direct way to query via mongo?
EDIT
Is there a way I can do this query across the documents? For documents obeying specific criteria, I want to run a similar calculation with an added constraint, that words of duplicate sentences are not counted twice. For example,
Document - 1
"sourceList": [
{
"source" : "hello world, how are you?",
"_id" : ObjectId("5f0eb9946db57c0007841153")
},
{
"source" : "hello world, I am fine",
"_id" : ObjectId("5f0eb9946db57c0007841153")
},
{
"source" : "Is it raining?",
"_id" : ObjectId("5f0eb9946db57c0007841153")
}
]
Document - 2
"sourceList": [
{
"source" : "hello world, how are you?",
"_id" : ObjectId("5f0eb9946db57c0007841153")
},
{
"source" : "hello world, I am fine",
"_id" : ObjectId("5f0eb9946db57c0007841153")
},
{
"source" : "Is it raining?",
"_id" : ObjectId("5f0eb9946db57c0007841153")
}
]
Here the count still remains the same. The reason being, sentences are exactly same in both the documents. But if we combine Document 1 + Document 3 (given as follows)
"sourceList": [
{
"source" : "Look at the beautiful tiger!",
"_id" : ObjectId("5f0eb9946db57c0007841153")
}
]
The count would come as 13 + 5 (document 3) = 18.
Upvotes: 0
Views: 391
Reputation: 22974
Yes, You can do that with the help of powerful aggregate framework.
db.collection.aggregate([
{
"$unwind": "$sourceList" //For each array element
},
{
$project: {
"sp": {
$split: [
"$sourceList.source", //split by spaces
" "
]
}
}
},
{
"$project": {
"sizes": {
"$size": "$sp". //count the words in each array
}
}
},
{
"$group": {
"_id": "$_id",
"count": {
"$sum": "$sizes" //group by id to reverse unwind and add the sizes
}
}
}
])
Update:
db.collection.aggregate([
{
"$unwind": "$sourceList"
},
{
$project: {
"sp": {
$split: [
"$sourceList.source",
" "
]
}
}
},
{
"$project": {
"sizes": {
"$size": "$sp"
}
}
},
{
"$group": {
"_id": null,
"count": {
"$sum": "$sizes"
}
}
}
])
For huge collections, you may need to use allowDiskUse
but it is very heavy operation for larger collections.
Update:
db.collection.aggregate([
{
"$unwind": "$sourceList"
},
{
$project: {
"sp": {
$split: [
"$sourceList.source",
" "
]
}
}
},
{
"$group": {
"_id": null,
"elements": {
$addToSet: "$sp"
}
}
},
{
"$unwind": "$elements"
},
{
"$project": {
"sizes": {
"$size": "$elements"
}
}
},
{
"$group": {
"_id": null,
"count": {
"$sum": "$sizes"
}
}
}
])
Upvotes: 2