Reputation: 2728
I am trying to gather all the hashtags i have in a collection of tweets in mongodb and i want to count how many times each hashtag appears in tweets. hcoll is the collection of hashtags that is created with this code.
BasicDBObject key = new BasicDBObject("hashtag",hashtagobj.get("hashtag"));
BasicDBObject update = new BasicDBObject("$addtoSet", new BasicDBObject("tweetsid",hashtagobj.get("_id")));
update.put("$inc",new BasicDBObject("count", 1));
hcoll.update(key,update,true,false);
but if this code is executed a second time for the same tweets then even though the tweetid is not added a second time to the array, the counter "count" is incremented.
I am looking for a way to increment the value of "count" only if the tweetid is not in the array "tweetsid". But i want it with one query since i understand how to do it using two or more queries. If this is not possible please tell me so i just go for it with two or more queries! Thanks :D
Upvotes: 0
Views: 438
Reputation: 5548
One possible solution is to modify the query document to assert that the tweetid in question is not already in the "tweetsid" array. If it is, the query will not match, and the update will not be performed.
Here is an example using the JS shell. It is difficult to give an exact answer without an example document, so I have taken a guess at your document structure. Hopefully it is close enough so the example is relevant to you.
> db.hcoll.save({_id:1, hashtag:"myHashTag", count:0, tweetsid:[]})
> db.hcoll.find()
{ "_id" : 1, "hashtag" : "myHashTag", "count" : 0, "tweetsid" : [ ] }
The following update will add "id1" to the "tweetsid" array and increment the value of "count by 1"
> db.hcoll.update({hashtag:"myHashTag", tweetsid:{$ne:"id1"}}, {$addToSet:{"tweetsid":"id1"}, $inc:{"count":1}})
> db.hcoll.find()
{ "_id" : 1, "count" : 1, "hashtag" : "myHashTag", "tweetsid" : [ "id1" ] }
If the update is performed again, "count" will not be incremented, because the {tweetsid:{$ne:"id1"}}
part of the query does not match.
> db.hcoll.update({hashtag:"myHashTag", tweetsid:{$ne:"id1"}}, {$addToSet:{"tweetsid":"id1"}, $inc:{"count":1}})
> db.hcoll.update({hashtag:"myHashTag", tweetsid:{$ne:"id1"}}, {$addToSet:{"tweetsid":"id1"}, $inc:{"count":1}})
> db.hcoll.update({hashtag:"myHashTag", tweetsid:{$ne:"id1"}}, {$addToSet:{"tweetsid":"id1"}, $inc:{"count":1}})
> db.hcoll.find()
{ "_id" : 1, "count" : 1, "hashtag" : "myHashTag", "tweetsid" : [ "id1" ] }
>
I see from your post that you are performing the update with upsert=true, indicating that you would like the document to be created if it does not exist. Unfortunately, the update that I presented will not work with upsert, because if the new "tweetsid" value is in the "tweetsid" array, the query will not match, and the upsert will create a new document.
> db.hcoll.update({hashtag:"myHashTag", tweetsid:{$ne:"id1"}}, {$addToSet:{"tweetsid":"id1"}, $inc:{"count":1}}, true, false)
> db.hcoll.find()
{ "_id" : 1, "count" : 1, "hashtag" : "myHashTag", "tweetsid" : [ "id1" ] }
{ "_id" : ObjectId("4f91ae48f48744310eab90d2"), "count" : 1, "hashtag" : "myHashTag", "tweetsid" : [ "id1" ] }
>
Hopefully the above will provide you with some ideas and help you to find a solution.
Upvotes: 3
Reputation: 45287
But i want it with one query since i understand how to do it using two or more queries. If this is not possible please tell me so...
This is not possible.
In fact, I will go one step further, here is the JIRA ticket. You can vote for it in JIRA there.
Upvotes: 1