M.Guo
M.Guo

Reputation: 125

How to extend bloom filter when it runs out of space?

I am studying bloom filter algorithm. The concept is quite straight forward, below is my simple implementation of "bloom filter structure" in Java.
My question is how to extend the capacity when the bitset is almost full? If I change the size of the bitset, obvious I have to consider the hash functions again, and I have to re-arrange those exist elements.
Second thought is to initialize another instance of bloom filter. But these are only my thoughts, anyone can help with these? Thanks!

public class BloomFilter {

    private static final int DEFAULT_SIZE = 2 << 24;
    private static final int[] seeds = {7, 11, 13, 31, 37, 61};

    static class SimpleHash {
        private int cap;
        private int seed;

        public SimpleHash(int cap, int seed) {
            this.cap = cap;
            this.seed = seed;
        }

        public int hash(String str) {
            int result = 0;
            int length = str.length();
            for (int i = 0; i < length; i++) {
                result = seed * result + str.charAt(i);
            }
            return (cap - 1) & result;
        }
    }

    private BitSet bitSet;
    private SimpleHash[] hashes;

    public BloomFilter() {
        bitSet = new BitSet(DEFAULT_SIZE);
        hashes = new SimpleHash[seeds.length];
        for (int i = 0; i < seeds.length; i++) {
            hashes[i] = new SimpleHash(DEFAULT_SIZE, seeds[i]);
        }
    }

    public void add(String str) {
        for (SimpleHash hash : hashes) {
            bitSet.set(hash.hash(str), true);
        }
    }

    public boolean mightContains(String str) {
        if (str == null) {
            return false;
        }
        boolean result = true;
        for (SimpleHash hash : hashes) {
            result = result && bitSet.get(hash.hash(str));
        }

        return result;
    }
}

Upvotes: 2

Views: 2167

Answers (1)

gudok
gudok

Reputation: 4179

Bloom filter works only when you know number of elements to be inserted in advance. Usually you have desired false positive error P and number of elements to be inserted N, and you use them to compute number of hash functions H and capacity M.

If you don't know number of elements in advance, then the only way is to store all elements somewhere externally as you add them to bloom filter (for example, in file). When number of added elements exceeds safe threshold N, you:

  1. delete your current bloom filter instance
  2. create new bloom filter instance with new M and H derived from P and N*2 (or N*3/2)
  3. read all elements from file and insert them in new bloom filter instance

Upvotes: 2

Related Questions