sparker
sparker

Reputation: 62

Iterate big hashmap in parallel

I have a linked hashmap which may contain upto 300k records at maximum. I want to iterate this map in parallel to improve the performance. The function iterates through the map of vectors and finds dot product of given vector against all the vectors in map. Also have one more check based on date value. And the function returns a nested hashmap. T

This is the code using iterator:

public HashMap<String,HashMap<String,Double>> function1(String key, int days) {
    LocalDate date = LocalDate.now().minusDays(days);
    HashMap<String,Double> ret = new HashMap<>();
    HashMap<String,Double> ret2 = new HashMap<>();
    OpenMapRealVector v0 = map.get(key).value;
    for(Map.Entry<String, FixedTimeHashMap<OpenMapRealVector>> e: map.entrySet()) {
        if(!e.getKey().equals(key)) {
            Double d = v0.dotProduct(e.getValue().value);
            d = Double.parseDouble(new DecimalFormat("###.##").format(d));
            ret.put(e.getKey(),d);
            if(e.getValue().date.isAfter(date)){
                ret2.put(e.getKey(),d);
            }
        }
    }
    HashMap<String,HashMap<String,Double>> result = new HashMap<>();
    result.put("dot",ret);
    result.put("anomaly",ret2);
    return result;
}

Update: I looked into Java 8 streams, but I am running into CastException and Null pointer exceptions when using the parallel stream as this map is being modified else where.

Code:

public HashMap<String,HashMap<String,Double>> function1(String key, int days) {
    LocalDate date = LocalDate.now().minusDays(days);
    HashMap<String,Double> ret = new HashMap<>();
    HashMap<String,Double> ret2 = new HashMap<>();
    OpenMapRealVector v0 = map.get(key).value;
    synchronized (map) {
        map.entrySet().parallelStream().forEach(e -> {
            if(!e.getKey().equals(key)) {
                Double d = v0.dotProduct(e.getValue().value);
                d = Double.parseDouble(new DecimalFormat("###.##").format(d));
                ret.put(e.getKey(),d);
                if(e.getValue().date.isAfter(date)) {
                    ret2.put(e.getKey(),d);
                }
            }
        });
    }
}

I have synchronized the map usage, but it still gives me the following errors:

java.util.concurrent.ExecutionException: java.lang.ClassCastException
Caused by: java.lang.ClassCastException
Caused by: java.lang.ClassCastException: java.util.HashMap$Node cannot be cast to java.util.HashMap$TreeNode

Also, I was thinking Should i split up the map into multiple pieces and run each using different threads in parallel?

Upvotes: 0

Views: 5318

Answers (2)

Ravindra Ranwala
Ravindra Ranwala

Reputation: 21124

One possible solution using Java 8 would be,

Map<String, Double> dotMap = map.entrySet().stream().filter(e -> !e.getKey().equals(key))
        .collect(Collectors.toMap(Map.Entry::getKey, e -> Double
                .parseDouble(new DecimalFormat("###.##").format(v0.dotProduct(e.getValue().value)))));
Map<String, Double> anomalyMap = map.entrySet().stream().filter(e -> !e.getKey().equals(key))
        .filter(e -> e.getValue().date.isAfter(date))
        .collect(Collectors.toMap(Map.Entry::getKey, e -> Double
                .parseDouble(new DecimalFormat("###.##").format(v0.dotProduct(e.getValue().value)))));
result.put("dot", dotMap);
result.put("anomaly", anomalyMap);

Update

Here's much more elegant solution,

Map<String, Map<String, Double>> resultMap = map.entrySet().stream().filter(e -> !e.getKey().equals(key))
        .collect(Collectors.groupingBy(e -> e.getValue().date.isAfter(date) ? "anomaly" : "dot",
                Collectors.toMap(Map.Entry::getKey, e -> Double.parseDouble(
                        new DecimalFormat("###.##").format(v0.dotProduct(e.getValue().value))))));

Here we first group them based on anomaly or dot, and then use a downstream Collector to create a Map for each group. Also I have updated .filter() criteria based on the following suggestions.

Upvotes: 2

Mehdi
Mehdi

Reputation: 775

You need to retrieve the Set<Map.Entry<K, V>> from the map.

Here's how you iterate on a Map using parallel Streams in Java8:

Map<String, String> myMap = new HashMap<> ();
myMap.entrySet ()
    .parallelStream ()
    .forEach (entry -> {
        String key = entry.getKey ();
        String value = entry.getValue ();
        // here add whatever processing you wanna do using the key / value retrieved
        // ret.put (....);
        // ret2.put (....)
    });

Clarification:

The maps ret and ret2 should be declared as ConcurrentHashMaps to allow the concurrent inserts / updates from multiple threads.

So the declaration of the 2 maps become:

Map<String,Double> ret = new ConcurrentHashMap<> ();
Map<String,Double> ret2 = new ConcurrentHashMap<> ();

Upvotes: 4

Related Questions