Reputation: 71
I'm trying to get data from mongoDB without repeat values. I want to filter following data
{"page":"www.abc.com","impressions":1,"position":144}
{"page":"www.abc.com","impressions":1,"position":8}
{"page":"www.xyz.com","impressions":7,"position":4}
{"page":"www.pqr.com","impressions":1,"position":7}
{"page":"www.abc.com","impressions":1,"position":19}
to filter as following. any idea how should I do that ?
{"page":"www.xyz.com","impressions":7,"position":4}
{"page":"www.pqr.com","impressions":1,"position":7}
Upvotes: 2
Views: 340
Reputation: 3289
In java for mongodb java driver 3.0+ it could be:
public static void main(String[] args) {
try (MongoClient client = new MongoClient("127.0.0.1")) {
MongoCollection<Document> col = client.getDatabase("test").getCollection("test");
Document groupFields = new Document("_id", "$page");
groupFields.put("count", new Document("$sum", 1));
groupFields.put("impressions", new Document("$first", "$impressions"));
groupFields.put("position", new Document("$first", "$position"));
Document matchFields = new Document("count", 1);
Document projectFields = new Document("_id", 0);
projectFields.put("page", "$_id");
projectFields.put("impressions", 1);
projectFields.put("position", 1);
AggregateIterable<Document> output = col.aggregate(Arrays.asList(
new Document("$group", groupFields),
new Document("$match", matchFields),
new Document("$project", projectFields)
));
for (Document doc : output) {
System.out.println(doc);
}
}
}
Output for your db is:
Document{{impressions=1.0, position=7.0, page=www.pqr.com}}
Document{{impressions=7.0, position=4.0, page=www.xyz.com}}
Upvotes: 2
Reputation: 103435
You should be able to run an aggregation pipeline that groups the documents by the page
field using the $group
pipeline operator, get a count of the documents using the $sum
operator and retain the other two fields using the $first
(or $last
) operator.
The preceding pipeline after the $group
should be able to filter the grouped documents on the count field, i.e. filter out the duplicates from the result. Use the $match
pipeline operator for such query.
A final cosmetic pipeline would involve the $project
stage which reshapes each document in the stream, include, exclude or rename fields, inject computed fields, create sub-document fields, using mathematical expressions, dates, strings and/or logical (comparison, boolean, control) expressions.
Run this aggregation pipeline to get the desired result:
db.collection.aggregate([
{
"$group": {
"_id": "$page",
"count": { "$sum": 1 },
"impressions": { "$first": "$impressions" },
"position": { "$first": "$position" }
}
},
{ "$match": { "count": 1 } },
{
"$project": {
"_id": 0,
"page": "$_id",
"impressions": 1,
"position": 1
}
}
])
Upvotes: 2