Reputation: 73
I'm trying to get the size of bloom filter set while using bloom filter's union & intersection functions with python package(https://github.com/jaybaird/python-bloomfilter.git)
I though that after conducting the function 'union' or 'intersection', then I could get the result by adding len() function, but it just print out only '0' output.
from pybloom import BloomFilter
bf1 = BloomFilter(1000)
bf2 = BloomFilter(1000)
# After adding some elements to bf1 and bf2
print(len(bf1.union(bf2)))
# expected max(len(bf1), len(bf2)) but the result was 0
After I find the document page, I realized that the len() option become disabled after 'union' function and its actual result len() was 0.
Instead, I want to approximate the size of bloom filter set somehow. Do you have any idea in order to calculate the size of it?
Upvotes: 3
Views: 904
Reputation: 2882
The implementation only copies BloomFilter's bitarray, i.e. self.bitarray
. The elements self.count
in previous filters are not counted in.
So it doesn't union the elements - but do a bitarray or
.
Update:
In most cases you don't need to approximate the count. It provided a precise count of elements when you call add
, and you can just call len(bf3)
. Unfortunately new created bf3
has not been called add
so len(bf3) == 0
.
For the formula to approximate number of elements,
- m / k * ln(1- n / m)
You have
import math.log as ln
m = bf3.bitarray.length()
n = bf3.bitarray.count()
k = bf3.num_slices
# given m=20, n=8, approximate n elements as 5.89
Upvotes: 3