Reputation: 172
So I understand that bitset vectors can essentially store true/false sets for you in each bit, however I'm confused as to the difference between that and a bloom filter, I understand bloom filters make use of hashing functions and can return false positives, however what's the actual difference in the type of data they can store/functions they can do ?
Upvotes: 0
Views: 866
Reputation: 386
(I know this is an old post but posting anyway for future readers.)
Think of Bitset as as lossless compression of a boolean array Say you have to store 32 boolean values in a collection. Your immediate intuition is to use an array or a list of booleans Say every item in that array will need a byte of space, so the array adds up to 32 bytes But you see, every item is a byte so it is 8 bits. Although just 1 bit is enough to say true or false, you are using 8 bits for that item. Since there are 32 bytes, we end up using 32 * 8 = 256 bits. Then the question arises, "Why don't we use a single 32-bit number to store 1 or 0 in every bit corresponding to those 32 items?" That is nothing but a bitset - you will use just 32 bits as opposed to 256 bits to store the same information. Here, you can say if an item is present or not by checking its corresponding bit position. This kind of storage helps everywhere and particularly in memory intensive code in embedded systems, high-end games with lot of graphics/Math operations per second and specific applications in machine learning too.
Think of Bloom filter as a lossy compression of a boolean array or a lossy compression of a bitset - depending on which underlying datastructure you want to implement with. Here, it is lossy because of the fact that you can never know whether a given item is present BUT you can definitely say if it is absent! It uses a hash function to set certain bits of the bitset to 1 for a given input. For another input, some other bits will be set. The catch is, there can be common bits which are set for two different inputs. Because of this, you cannot say if an item is present because common bits might be set based on multiple items. However even if a single bit is not set for a given input, you can definitely say it is absent. That is why i call this lossy.
Hope this is what you are looking for.
Upvotes: 0
Reputation: 9484
A bloom filter can be implemented using a bitset, but a bitset can not be implemented using a bloom filter.
Upvotes: 0
Reputation: 25536
Bitset vectors are simply a large field of an arbitrary number of bits that can be set individually, using their index.
A bloom filter is a kind of set (not containing the data itself) allowing to decide quickly if an element is contained in the set or not. It is build on top of some kind of bitset vector, setting several of the bits of the latter to 1 on inserting elements or reading them to check if an element is contained (without giving you direct access to its underlying bitset vector).
Upvotes: 2