Manjusha Bolishetty
Manjusha Bolishetty

Reputation: 1

Dynamic grouping and aggregation on List<Map<String, Object>> - Java 8

@Test
public void testAggregation() {
    List<Map<String, Object>> joinedList = new ArrayList<>();
    Map<String, Object> Myrecord = new HashMap<> ();
    Map<String, Object> Myrecord2 = new HashMap<> ();
    Map<String, Object> Myrecord3 = new HashMap<> ();

    Myrecord.put("ad_id", 8710);
    Myrecord.put("medium_type", 2);
    Myrecord.put("impressions", 36);
    joinedList.add(Myrecord);
    Myrecord2.put("ad_id", 8710);
    Myrecord2.put("medium_type", 2);
    Myrecord2.put("impressions", 1034);
    joinedList.add(Myrecord2);
    Myrecord3.put("ad_id", 9000);
    Myrecord3.put("medium_type", 2);
    Myrecord3.put("impressions", 10);
    joinedList.add(Myrecord3);
    System.out.println("Myrecord:" + joinedList);
    //joinedList: [{ad_id=8710, impressions=36, medium_type=2}, {ad_id=8710, impressions=1034, medium_type=2}, {ad_id=9000, impressions=10, medium_type=2}]
}

I have a use case wherein I need to extract the same set of schema from two tables and aggregate the data from both the table. My idea is to query the tables separately and keep them in a List> and merge them. Once I merge them - the sample output looks like below

 //joinedList: [{ad_id=8710, medium_type=2, impressions=36}, {ad_id=8710, medium_type=2, impressions=1034}, {ad_id=9000, medium_type=2, impressions=10}]

I want to perform a groupby operation on the dimensions(ad_id and medium_type which can be dynamic and vary on user input) and aggregate the metrics (which are also dynamic and vary on user input). In the example, groupby on ad_id and medium_type in the example above and aggregate the metric impressions and eventually, the result should be as below

//final output: [{ad_id=8710, medium_type=2, impressions=1070}, {ad_id=9000, medium_type=2, 
impressions=10}]

NOTE: the group by fields(ad_id, medium_type above) can be dynamic and are driven by what the user inputs. They can be anything apart from ad_id, medium_type. Likewise with metrics as well and the user might be interested in impression, clicks, metric3, metric4.

Upvotes: 0

Views: 307

Answers (1)

Naman
Naman

Reputation: 32036

Since this wouldn't fit in the comments. I believe your actual problem is boiling down to looking for dynamism while defining operations to be performed. But as pointed out in comments as well, this would need some conclusive set of operations to decide over an approach.

With the current description, for example you could have performed something on the lines of(note the comments could be highlighting an actual problem you're asking):

@AllArgsConstructor
static class Record {
    Integer adId;
    Integer mediumType;
    Long impressions;

    // note these are identity only for sum operation
    static Record identity() {
      return new Record(0, 0, 0L);
    }

    static Function<Record, List<Object>> classifierToGroupBy() {
      return r -> Arrays.asList(r.adId, r.mediumType); // make this dynamic
    }

    static BinaryOperator<Record> mergeOperationInDownstream() {
        return (a, b) -> new Record(a.adId, a.mediumType, 
              a.impressions + b.impressions); //dynamic field and operation selection
    }
}

public List<Record> processData(List<Record> records) {
    return new ArrayList<>(records.stream()
            .collect(Collectors.groupingBy(Record.classifierToGroupBy(),
                    Collectors.reducing(Record.identity(), Record.mergeOperationInDownstream())))
            .values());
}

Upvotes: 0

Related Questions