Reputation: 69259
As we know, iterating over a concurrent collection is not thread safe by default, so one cannot use:
Set<E> set = Collections.synchronizedSet(new HashSet<>());
//fill with data
for (E e : set) {
process(e);
}
This happens as data may be added during iteration, because there is no exclusive lock on set
.
This is describe in the javadoc of Collections.synchronizedSet
:
public static Set synchronizedSet(Set s)
Returns a synchronized (thread-safe) set backed by the specified set. In order to guarantee serial access, it is critical that all access to the backing set is accomplished through the returned set.
It is imperative that the user manually synchronize on the returned set when iterating over it:
Set s = Collections.synchronizedSet(new HashSet());
...
synchronized (s) { Iterator i = s.iterator(); // Must be in the synchronized block while (i.hasNext()) foo(i.next()); }
Failure to follow this advice may result in non-deterministic behavior.
However, this does not apply to Set.forEach
, which inherits the default method forEach
from Iterable.forEach.
Now I looked into the source code, and here we can see that we have the following structure:
Collections.synchronizedSet()
.We get one:
public static <T> Set<T> synchronizedSet(Set<T> s) {
return new SynchronizedSet<>(s);
}
...
static class SynchronizedSet<E>
extends SynchronizedCollection<E>
implements Set<E> {
private static final long serialVersionUID = 487447009682186044L;
SynchronizedSet(Set<E> s) {
super(s);
}
SynchronizedSet(Set<E> s, Object mutex) {
super(s, mutex);
}
public boolean equals(Object o) {
if (this == o)
return true;
synchronized (mutex) {return c.equals(o);}
}
public int hashCode() {
synchronized (mutex) {return c.hashCode();}
}
}
It extends SynchronizedCollection
, which has the following interesting methods next to the obvious ones:
// Override default methods in Collection
@Override
public void forEach(Consumer<? super E> consumer) {
synchronized (mutex) {c.forEach(consumer);}
}
@Override
public boolean removeIf(Predicate<? super E> filter) {
synchronized (mutex) {return c.removeIf(filter);}
}
@Override
public Spliterator<E> spliterator() {
return c.spliterator(); // Must be manually synched by user!
}
@Override
public Stream<E> stream() {
return c.stream(); // Must be manually synched by user!
}
@Override
public Stream<E> parallelStream() {
return c.parallelStream(); // Must be manually synched by user!
}
The mutex
used here is the same object as to which all operations of Collections.synchronizedSet
lock to.
Now we can, judging by the implementation say that it is thread safe to use Collections.synchronizedSet(...).forEach(...)
, but is it also thread safe by specification?
(Confusingly enough, Collections.synchronizedSet(...).stream().forEach(...)
is not thread safe by implementation, and the verdict of the specification seems to be unknown aswell.)
Upvotes: 29
Views: 6391
Reputation: 24721
As @Holger said, the doc clearly says user must manually synchronize collections returned by Collections.synchronizedXyz()
when tranversing it via Iterator
:
It is imperative that the user manually synchronize on the returned collection when traversing it via Iterator, Spliterator or Stream:
Collection c = Collections.synchronizedCollection(myCollection); ... synchronized (c) { Iterator i = c.iterator(); // Must be in the synchronized block while (i.hasNext()) foo(i.next()); }
I want to explain a bit more about code.
Consider Collections.synchronizedList()
method. It returns Collections.SynchronizedList class instance, which extends SynchronizedCollection
which defines iterator()
as follows:
public Iterator<E> iterator() {
return c.iterator(); // Must be manually synched by user!
}
Compare this with other methods of SynchronizedCollections
, for example:
public String toString() {
synchronized (mutex) {return c.toString();}
}
Thus SynchronizedList
inherits iterator()
from SynchronizedCollection
, which needs to be synched manually by user.
Upvotes: 0
Reputation: 6533
As you wrote, judging by implementation, forEach()
is thread-safe for the collections provided with JDK (see disclaimer below) as it requires monitor of mutex to be acquired to proceed.
Is it also thread safe by specification?
My opinion - no, and here is an explanation. Collections.synchronizedXXX()
javadoc, rewritten in short words, says - "all methods are thread-safe except for those used for iterating over it".
My other, although very subjective argument is what yshavit wrote - unless told/read that, consider API/class/whatever not thread-safe.
Now, let's take a closer look at the javadocs. I guess I may state that method forEach()
is used to iterate over it, so, following the advice from javadoc, we should consider it not thread-safe, although it is opposite to reality (implementation).
Anyway, I agree with yshavit's statement that the documentation should be updated as this is most likely a documentation, not implementation flaw. But, no one can say for sure except for JDK developers, see concerns below.
The last point I'd like to mention within this discussion - we can assume that custom collection can be wrapped with Collections.synchronizedXXX()
, and the implementation of forEach()
of this collection can be... can be anything. The collection might perform asynchronous processing of elements within the forEach()
method, spawn a thread for each element... it is bounded only by author's imagination, and synchronized(mutex) wrap cannot guarantee thread-safety for such cases. That particular issue might be the reason not to declare forEach()
method as thread-safe..
Upvotes: 11
Reputation: 298123
It’s worth to have a look at the documentation of Collections.synchronizedCollection
rather than Collections.synchronizedSet()
as that documentation has been cleaned up already:
It is imperative that the user manually synchronize on the returned collection when traversing it via
Iterator
,Spliterator
orStream
: …
I think, this makes it pretty clear that there is a distinction between the iteration via an object other than the synchronized Collection
itself and using its forEach
method. But even with the old wording you can draw the conclusion that there is such a distinction:
It is imperative that the user manually synchronize on the returned set when iterating over it:…
(emphasis by me)
Compare to the documentation for Iterable.forEach
:
Performs the given action for each element of the
Iterable
until all elements have been processed or the action throws an exception.
While it is clear to the developer that there must be an (internal) iteration happening to achieve this, this iteration is an implementation detail. From the given specification’s wording it’s just a (meta-)action for performing an action to each element.
When using that method, the user is not iterating over the elements and hence not responsible for the synchronization mentioned in the Collections.synchronized…
documentation.
However, that’s a bit subtle and it’s good that the documentation of synchronizedCollection
lists the cases for manual synchronization explicitly and I think the documentation of the other methods should be adapted as well.
Upvotes: 6