Reputation: 2143
I have this:
long hnds[] = new long[133784560]; // 133 million
Then I quickly fill the array (couple of ms) and then I somehow want to know the number of unique (i.e. distinct) values. Now, I don't even need this realtime, I just need to try out a couple of variations and see how many unique values each gives.
I tried e.g. this:
import org.apache.commons.lang3.ArrayUtils;
....
HashSet<Long> length = new HashSet<Long>(Arrays.asList(ArrayUtils.toObject(hnds)));
System.out.println("size: " + length.size());
and after waiting for half an hour it gives a heap space error (I have Xmx4000m).
I also tried initializing Long[] hnds instead of long[] hnds, but then the initial filling of the array takes forever. Or for example use a Set from the beginning when adding the values, but also then it takes forever. Is there any way to count the distinct values of a long[] array without waiting forever? I'd write it to a file if I have to, just some way.
Upvotes: 1
Views: 301
Reputation: 40510
Just sort it and count.
int sz = 133784560;
Random randy = new Random();
long[] longs = new long[sz];
for(int i = 0; i < sz; i++) { longs[i] = randy.nextInt(10000000); }
Arrays.sort(longs);
long lastSeen = longs[0];
long count = 0;
for(int i = 1; i < sz; i++) {
if(longs[i] != lastSeen) count++;
lastSeen = longs[i];
}
Takes about 15 seconds on my laptop.
Upvotes: 1
Reputation: 198211
My best suggestion would be to use a library like fastutil (http://fastutil.di.unimi.it/) and then use the custom unboxed hash set:
import it.unimi.dsi.fastutil.longs.LongOpenHashSet;
System.out.println(new LongOpenHashSet(hnds).size());
(Also, by the way, if you can accept approximate answers, there are much more efficient algorithms you can try; see e.g. this paper for details.)
Upvotes: 3