Reputation: 548
I have a large data set with range 0 to Long.MAX_VALUE,
and would like to search for any duplication using BitSet.
While Java BitSet doesn't allow long for its functions.
Is it possible to achieve with BitSet?
// incoming data have range 0 to 9,223,372,036,854,775,807 (Long max value)
// e.g. 1, 3, 5, 1, 2_000_000_000, 2_000_000_000
// expected output: 1, 2_000_000_000, as they appear twice
long[] myData = new long[]{1, 3, 5, 1, 2_000_000_000, 2_000_000_000};
// int[] myData = new int[]{1, 3, 5, 1}; // it working well for int array
BitSet bs = new BitSet();
for(int i = 0; i < myData.length; i++) {
if(bs.get(myData[i])) { // fail here as bitset only accept int
System.out.println("duplicated number: " + myData[i]);
} else {
bs.set(myData[i]); // same here
}
}
Upvotes: 0
Views: 624
Reputation: 40044
BitSet
could be used by seeing which bits were previously set. That would then constitute a duplicate value. However, you can't set a bit position greater than Integer.MAX_VALUE (and it would be infeasible to handle such a large range for longs). So it would not work for the range you suggested. And I presume you would still want to record the duplicates.
I would use a Map<Long,Long>
to do a frequency count. Then you can just determine the exact count of each value provided. And locating the next Key of a map is comparable to calculating which internal long value holds the desired bit. So I don't believe performance is a factor here.
If you simply want to eliminate duplicates, then just put them in a Set<Long>
Based on your comment, check out this simple test for holding one large value in a BitSet.
BitSet bitSet = new BitSet();
bitSet.set(Integer.MAX_VALUE);
long[] backingArray = bitSet.toLongArray();
System.out.printf("Size of backing array = %,d longs.%n",backingArray.length);
Prints
Size of backing array = 33,554,432 longs.
Upvotes: 2