Qrious
Qrious

Reputation: 677

Aggregate information using Java 8 streams

I'm still trying to fully grasp working with the Stream package in Java 8 and was hoping for some help.

I have a class, described below, instances of which I receive in a list as part of a database call.

class VisitSummary {
    String source;
    DateTime timestamp;
    Integer errorCount;
    Integer trafficCount;
    //Other fields
}

To generate some possibly useful information about this, I have a class VisitSummaryBySource which holds the sum total of all visits (for a given timeframe):

class VisitSummaryBySource {
    String sourceName;
    Integer recordCount;
    Integer errorCount;
}

I was hoping to construct a List<VisitSummaryBySource> collection which as the name sounds, holds the list of VisitSummaryBySource objects containing the sum total of records and errors encountered, for each different source.

Is there a way I can achieve this using streams in a single operation? Or do I need to necessarily break this down into multiple operations? The best I could come up with is:

Map<String, Integer> recordsBySrc = data.parallelStream().collect(Collectors.groupingBy(VisitSummaryBySource::getSource,
                    Collectors.summingInt(VisitSummaryBySource::getRecordCount)));

and to calculate the errors

Map<String, Integer> errorsBySrc = data.parallelStream().collect(Collectors.groupingBy(VisitSummaryBySource::getSource,
                    Collectors.summingInt(VisitSummaryBySource::getErrorCount)));

and merging the two maps to come up with the list I'm looking for.

Upvotes: 2

Views: 2150

Answers (1)

Stuart Marks
Stuart Marks

Reputation: 132370

You're on the right track. The uses of Collectors.summingInt are examples of downstream collectors of the outer groupingBy collector. This operation extracts one of the integer values from each VisitSummaryBySource instance in the same group, and sums them. This is essentially a reduction over integers.

The problem, as you note, is that you can extract/reduce only one of the integer values, so you have to perform a second pass to extract/reduce the other integer values.

The key is to consider reduction not over the individual integer values but over the entire VisitSummaryBySource object. Reduction takes a BinaryOperator, which takes two instances of the type in question and combines them into one. Here's how to do that, by adding a static method to VisitSummaryBySource:

static VisitSummaryBySource merge(VisitSummaryBySource a,
                                  VisitSummaryBySource b) {
    assert a.getSource().equals(b.getSource());
    return new VisitSummaryBySource(a.getSource(), 
                                    a.getRecordCount() + b.getRecordCount(),
                                    a.getErrorCount() + b.getErrorCount());
}

Note that we're not actually merging the source names. Since this reduction is only performed within a group, where the source names are the same, we assert that we can only merge two instances whose names are the same. We also assume the obvious constructor taking a name, record count, and error count, and call that to create the merged object, containing the sums of the counts.

Now our stream looks like this:

    Map<String, Optional<VisitSummaryBySource>> map =
        data.stream()
            .collect(groupingBy(VisitSummaryBySource::getSource,
                                reducing(VisitSummaryBySource::merge)));

Note that this reduction produces map values of type Optional<VisitSummaryBySource>. This is somewhat odd; we'll deal with it below. We could avoid the Optional by using another form of the reducing collector that takes an identity value. This is possible but somewhat nonsensical, as there's no good value to use for the source name of the identity. (We could use something like the empty string, but we'd have to abandon our assertion that we merge only objects whose source names are equal.)

We don't really care about the map; it only needs to be kept around long enough to reduce the VisitSummaryBySource instances. Once that's done, we can just pull out the map values using values() and throw away the map.

We can also turn this back into a stream and unwrap the Optional by mapping them through Optional::get. This is safe, because a value never ends up in the map unless there's at least one member of the group.

Finally, we collect the results into a list.

The final code looks like this:

    List<VisitSummaryBySource> output =
        data.stream()
            .collect(groupingBy(VisitSummaryBySource::getSource,
                                reducing(VisitSummaryBySource::merge)))
            .values().stream()
            .map(Optional::get)
            .collect(toList());

Upvotes: 1

Related Questions