Reputation: 13506
I am learning Bloom filter
and BitMap
(also known as Bit Array) and met a question,can someone give me some instructions on when to use Bloom filter and when to use BitMap?
In my understanding I think that when we need to find the largest number or want to sort the huge data,BitMap is more suitable(for pure digit).
If we want to check some IP address are contained in billions of existed records,then Bloom filter is more suitable(for string or other none pure digit).
However,I want to someone to give me more detailed instructions or suggestions,I have searched on Google and do not find some useful info. Thanks in advance!
Also I do not know if shall I put this question on stackoverflow or other sites,if it's not the right site,hope someone can point it out,thanks!
Upvotes: 3
Views: 3926
Reputation: 50127
When to use a Bloom filter: if you have a set (a list of unique entries) and a hash function, you can create a Bloom filter. It allows queries of type "is entry x likely in the set?". The query will return true for sure if the entry is in the set. For entries not in the set, it may return true, with a low probability, typically 1% or lower (depending on the size of the Bloom filter). You can make the Bloom filter as small as you like, but for a false positive rate of around 1% you need about 10 bits per entry. There are alternative algorithms / data structures that use less space, see e.g. https://github.com/FastFilter. A Bloom filter internally uses a bit array by the way.
When to use a bit array (also called bitset): if the entries are numbers between 0..n, then you can use a bit array as follows: set the bit x for each entry. That will require n bits (no matter how many entries). So if your entries can be large numbers, then there is a problem: it will use a lot of memory. However, you could create a sparse bit array (also called compressed bit array), e.g. using https://roaringbitmap.org/. You won't have false positives as with Bloom filters, but size usage depends a lot on your data (on the numbers you have), much more so than with a Bloom filter.
Upvotes: 7