user710818
user710818

Reputation: 24248

Java algorithm to track parts of aggregated values

My program have evaluate hundreds of millions of records. So the question of memory and performance are important. Lets each record has key - ticketID. Also record has field value and field source_name. In source ticketID have from 1 to many (neary 100) source_name. I need aggregate only by ticketID - receive nearly 1 million of record, but also must have possibility subtract values for specified source_name - so I have track contributes.

Do exist some algorithms or data structures that allow resolve this problem?

Upvotes: 0

Views: 583

Answers (1)

Gray
Gray

Reputation: 116908

I can't quite parse the question fully so I'll assume:

  • "nearly 1 million of record" means that there is nearly 1 million unique ticketID fields.
  • "nearly 100" different source_names in the system.
  • not all ticketIds have source_names. We don't have 100 million ticketID x source_name combinations.
  • You want to be able to total all of the ticketIds but also total by source_name.

With these assumptions I would use a Map of maps. The outer Map has a key of source_name and the value of the inner Map. The inner Map has a key of the ticketId and a cumulative value.

So the pseudo code would look something like:

Map<String, Map<Integer,Double>> valueMap =
    new HashMap<String, Map<Integer,Double>>();

while (...reading in and processing data...) {
    int ticketId = ...;
    String sourceName = ...;
    double entryValue = ...;

    Map<Integer,Double> sourceNameMap = valueMap.get(sourceName);
    Double value = sourceNameMap.get(ticketId);
    if (oldValue == null) {
        value = entryValue;
    } else {
        value += entryValue;
    }
    sourceNameMap.put(ticketId, value);
}

You can easily get the total by adding up each of the source_name maps. You can also keep a running total for each source_name of course if that helps. If your system can allocate a gigabyte to the JVM then it should be able to handle a good number of ticketID x source_name pairs.

You might consider creating a mutable internal value class to save on GC cycles:

private static class MutableValue {
    double value;
    public MutableValue(double value) {
        this.value = value;
    }
    public void add(double value) {
        this.value += value;
    }
}

So then you can say:

MutableValue value = sourceNameMap.get(ticketId);
if (oldValue == null) {
    sourceNameMap.put(new MutableValue(entryValue));
} else {
    value.add(entryValue);
}

If you edit your question, I'll edit my answer in case I've made some improper assumptions.

Upvotes: 2

Related Questions