A.K.
A.K.

Reputation: 71

How to filter data without repeat values in mongodb with java

I'm trying to get data from mongoDB without repeat values. I want to filter following data

{"page":"www.abc.com","impressions":1,"position":144}
{"page":"www.abc.com","impressions":1,"position":8}
{"page":"www.xyz.com","impressions":7,"position":4}
{"page":"www.pqr.com","impressions":1,"position":7}
{"page":"www.abc.com","impressions":1,"position":19}

to filter as following. any idea how should I do that ?

{"page":"www.xyz.com","impressions":7,"position":4}
{"page":"www.pqr.com","impressions":1,"position":7}

Upvotes: 2

Views: 340

Answers (2)

Mike Shauneu
Mike Shauneu

Reputation: 3289

In java for mongodb java driver 3.0+ it could be:

public static void main(String[] args) {
    try (MongoClient client = new MongoClient("127.0.0.1")) {
        MongoCollection<Document> col = client.getDatabase("test").getCollection("test");

        Document groupFields = new Document("_id", "$page");
        groupFields.put("count", new Document("$sum", 1));
        groupFields.put("impressions", new Document("$first", "$impressions"));
        groupFields.put("position", new Document("$first", "$position"));

        Document matchFields = new Document("count", 1);

        Document projectFields = new Document("_id", 0);
        projectFields.put("page", "$_id");
        projectFields.put("impressions", 1);
        projectFields.put("position", 1);

        AggregateIterable<Document> output = col.aggregate(Arrays.asList(
                new Document("$group", groupFields),
                new Document("$match", matchFields),
                new Document("$project", projectFields)
        ));

        for (Document doc : output) {
            System.out.println(doc);
        }
    }
}

Output for your db is:

Document{{impressions=1.0, position=7.0, page=www.pqr.com}}
Document{{impressions=7.0, position=4.0, page=www.xyz.com}}

Upvotes: 2

chridam
chridam

Reputation: 103435

You should be able to run an aggregation pipeline that groups the documents by the page field using the $group pipeline operator, get a count of the documents using the $sum operator and retain the other two fields using the $first (or $last) operator.

The preceding pipeline after the $group should be able to filter the grouped documents on the count field, i.e. filter out the duplicates from the result. Use the $match pipeline operator for such query.

A final cosmetic pipeline would involve the $project stage which reshapes each document in the stream, include, exclude or rename fields, inject computed fields, create sub-document fields, using mathematical expressions, dates, strings and/or logical (comparison, boolean, control) expressions.

Run this aggregation pipeline to get the desired result:

db.collection.aggregate([
    { 
        "$group": {
            "_id": "$page",
            "count": { "$sum": 1 },
            "impressions": { "$first": "$impressions" },
            "position": { "$first": "$position" }
        }
    },
    { "$match": { "count": 1 } },
    {
        "$project": {
            "_id": 0,
            "page": "$_id",
            "impressions": 1,
            "position": 1
        }
    }
])

Upvotes: 2

Related Questions