KateS
KateS

Reputation: 33

Parallel streams

There is a function, which calculates the most frequent name (Human[] people) in parallel. But there is data race. Why?

    Map<String, Integer> nameMap = new ConcurrentHashMap<>();
        Arrays.stream(people)
                .parallel()
                .filter(p -> p.isAdult())
                .map(Human::getName)
                .forEach(p -> nameMap.put(p, nameMap.containsKey(p) ? nameMap.get(p) + 1 : 1));
        return nameMap.entrySet().parallelStream().max((entry1, entry2) -> entry1.getValue() > entry2.getValue() ? 1 : -1).get().getKey();

Upvotes: 3

Views: 86

Answers (1)

Eugene
Eugene

Reputation: 121048

because you are doing a get, then increment and then a put; in between someone might have already put that entry into nameMap.

You could have used ConcurrentHashMap#merge that is atomic here, or better use Collectors.toConcurrentMap

EDIT

You could have done it probably a bit more clear:

  Arrays.stream(people)
        .parallel()
        .filter(Human::isAdult)
        .collect(Collectors.groupingBy(Human::getName, Collectors.counting()))
        .entrySet()
        .stream()
        .max(Comparator.comparing(Entry::getValue))
        .map(Entry::getKey)
        .get();

Just notice that I am close to being sure you don't need parallel at all

Upvotes: 3

Related Questions