Reputation: 8747
This question has been asked previously but there was no answer for it at that time so I decided to ask it again.
I need an efficient implementation of a Bloom filter in C (not C++). If there is no such thing available, I would not mind implementing one, if given some good reference so that it doesn't take too much of my time.
I want to use this data structure for inserts and tests in a ratio (1:20k), so primarily it is test-intensive. The data to be tested is 64 bit integers.
Upvotes: 14
Views: 15342
Reputation: 12848
I am not sure how efficient it is, because I haven't done extensive benchmarks, but I did document the chain of thoughts on "How To Implement a Bloom Filter in C" in this article, with the associated GitHub repo.
The code is simplified and it works on char*
, but it can be extended for void*
with a little bit of tweaking.
The main approach would be to look for a Bit Vector (also known as a Bit Array) implementation that will represent the underlying memory structure for your Bloom Filter.
There are a few options here:
The second step is to find yourself some good hash functions. Usually, I recommend using MurmurHash, SpookyHash, or FNV.
After having those in place, the code is easier to write, write yourself a struct like this one:
typedef uint32_t (*hash32_func)(const void *data, size_t length);
typedef struct bloom_filter_s {
bit_vect *vect; // <--- Implementation dependent
hash32_func *hash_functions;
size_t num_functions;
size_t num_items;
} bloom_filter;
bloom_filter *bloom_filter_new(size_t size, size_t num_functions, ...);
bloom_filter *bloom_filter_new_default(size_t size);
void bloom_filter_free(bloom_filter *filter);
void bloom_filter_put(bloom_filter *filter, const void *data, size_t length);
void bloom_filter_put_str(bloom_filter *filter, const char *str);
bool bloom_filter_test(bloom_filter *filter, const void *data, size_t lentgth);
bool bloom_filter_test_str(bloom_filter *filter, const char *str);
And then start implementing the methods.
For example, adding an element to the Bloom filter should be as simple as:
void bloom_filter_put(bloom_filter *filter, const void *data, size_t length){
for(unsigned i = 0; i < filter->num_functions; i++) {
uint32_t cur_hash = filter->hash_functions[i](data, length);
bit_vect_set1(filter->vect, cur_hash % filter->vect->size);
filter->num_items++;
}
}
Also, contrary to popular belief you don't need K hash functions, you only need two and you can generate the others from them. Source here.
Upvotes: 0
Reputation: 50127
Several Bloom filter implementations, and alternative algorithms, are available in this project: https://github.com/FastFilter/fastfilter_cpp
Covered are the regular Bloom filter, blocked Bloom filter, cuckoo filter, Golomb coded set, and others. It is C++, but the main algorithms are easy to port to C.
Upvotes: 1
Reputation: 400019
Not to do too much self-promotion, but I've written a plugin for the Geany editor/IDE that filters out duplicate text lines, it uses a Bloom filter.
The implementation is in C, and you can find it right here on GitHub. It's GPL v3, so depending on your exact needs you may or may not be able to use it.
Some notes about my implementation:
BloomContains
callback function pointer used by bloom_filter_new()
). Just pass NULL
to get a "pure" filter.It hasn't been heavily tuned for performance, but should be okay. I would appreciate any feedback you might have after testing it, of course!
Upvotes: 4
Reputation: 189
I know this is an old question, but for reference, here are some github search results.
A simple search on github for 'bloomfilter' yields a ton of results (for C) :
https://github.com/search?q=extension%3Ac+bloomfilter&type=Code&ref=searchresults
This is for REFERENCE only.
Upvotes: 0
Reputation: 189
I have a stand-alone plain C library here which may be of use: https://github.com/jvirkki/libbloom
Upvotes: 17