ishaaq
ishaaq

Reputation: 6779

thread-safe bloomfilter

I am looking for a thread-safe BloomFilter implementation, i.e., an implementation that will behave exactly the same if values are put into the filter in sequence or simultaneously in parallel. The javadoc for guava's BloomFilter is silent about thread-safety. Is it thread-safe?

Upvotes: 3

Views: 1754

Answers (6)

Darrell Teague
Darrell Teague

Reputation: 4282

I wrote an early BloomFilter in Java (circa 1999) that is still in use today across many systems.

Since by their nature Bloom Filters can return false-positives in response to a contains() call, implementations like Guava realized there is actually no need for thread-safety. So the short answer to the OP question is whatever implementation there may be that has a thread-safe insert(key) method is a bit of a logical fallacy.

Consider the setup of a Bloom Filter:

A fixed size array of int (in my impl). All keys are first hashed so they resolve to a consistent, fixed length (use MurmurHash in my impl).

So now two concurrent threads write two different values:

  • They may produce the same hash value, producing a false-positive but upholding the contains(key) contract.

  • Otherwise the different hash values will go into a different location in the array - so they can never corrupt one another partially or otherwise.

Upvotes: 0

ponkin
ponkin

Reputation: 2441

I made java library with different bloom filters, they all are thread-safe. Check it out. ponkin/bloom

Upvotes: 0

0100110010101
0100110010101

Reputation: 6729

I've adopted the Apache Cassandra version to support concurrent updates, as we required this property internally. You can check it out here: https://github.com/ifesdjeen/blomstre

Upvotes: 0

UnixShadow
UnixShadow

Reputation: 1244

I had this problem last week.

Ended up creating a wrapper class for guava bloom filter, where I created a number of filters. Then depending on key to insert I selected one of the filters and synchronized on that alone. Running with 256 filters and 8 concurrent threads I reached 90% CPU utilization even tho the added synchronization.

Upvotes: 0

Ben Taitelbaum
Ben Taitelbaum

Reputation: 7403

As others have noted, the best solution is to patch an existing framework to make it thread-safe. But for a more immediate need, consider wrapping the non-thread-safe version in a thread-safe façade:

public class ThreadSafeBloomFilter {
  private BloomFilter filter;

  public ThreadSafeBloomFilter(BloomFilter filter) {
    this.filter = filter;
  }

  public synchronized void add(Object obj) {
    filter.add(obj);
  }
}

Implement the other methods you need in a similar manner. You probably also want to make it nice and generic as well (like a ThreadSafeBloomFilter, but that's your call).

In terms of performance, this will serialize all the requests, so you won't experience any performance gains by parallelizing access to it (and it could create a bottleneck).

Upvotes: 0

Louis Wasserman
Louis Wasserman

Reputation: 198311

Guava's BloomFilter isn't thread-safe.

That said, it's not 100% clear to me what semantics you'd expect for a thread-safe BloomFilter -- would put be atomic? I'm imagining the effects of replacing the long[] with an AtomicLongArray, and I'm pretty sure the semantics you end up with are "if put(x) happens-before mightContain(x), then mightContain(x) returns true," and I think those semantics are useful, but I'm not positive.

This might be worth filing an issue with a specific use case, but there would be some details to work out about what thread-safety would mean here, exactly.

Upvotes: 2

Related Questions