Reputation: 861
PROBLEM
I have a list of arrays and I want to count the occurrences of duplicates.
For example, if I have this :
{{1,2,3},
{1,0,3},
{1,2,3},
{5,2,6},
{5,2,6},
{5,2,6}}
I want a map (or any relevant collection) like this :
{ {1,2,3} -> 2,
{1,0,3} -> 1,
{5,2,6} -> 3 }
I can even lose the arrays values, I'm only interested in cardinals (e.g. 2, 1 and 3 here).
MY SOLUTION
I use the following algorithm :
First hash the arrays, and check if each hash is in an HashMap<Integer, ArrayList<int[]>>
, let's name it distinctHash, where the key is the hash and the value is an ArrayList, let's name it rowList, containing the different arrays for this hash (to avoid collisions).
If the hash is not in distinctHash, put it with the value 1 in another HashMap<int[], Long>
that counts each occurrence, let's call it distinctElements.
Then if the hash is in distinctHash, check if the corresponding array is contained in rowList. If it is, increment the value in distinctElements associated to the identical array found in rowList. (If you use the new array as a key you will create another key since their reference are different).
Here is the code, the boolean returned tells if a new distinct array was found, I apply this function sequentially on all of my arrays :
HashMap<int[], Long> distinctElements;
HashMap<Integer, ArrayList<int[]>> distinctHash;
private boolean addRow(int[] row) {
if (distinctHash.containsKey(hash)) {
int[] indexRow = distinctHash.get(hash).get(0);
for (int[] previousRow: distinctHash.get(hash)) {
if (Arrays.equals(previousRow, row)) {
distinctElements.put(
indexRow,
distinctElements.get(indexRow) + 1
);
return false;
}
}
distinctElements.put(row, 1L);
ArrayList<int[]> rowList = distinctHash.get(hash);
rowList.add(row);
distinctHash.put(hash, rowList);
return true;
} else {
distinctElements.put(row, 1L);
ArrayList<int[]> newValue = new ArrayList<>();
newValue.add(row);
distinctHash.put(hash, newValue);
return true;
}
}
QUESTION
The problem is that my algorithm is too slow for my needs (40s for 5,000,000 arrays, and 2h-3h for 20,000,000 arrays). Profiling with NetBeans told me that the hashing takes 70% of runtime (using Google Guava murmur3_128 hash function).
Is there another algorithm that could be faster? As I said I'm not interested in arrays values, only in the number of their occurrences. I am ready to sacrifice precision for speed so a probabilistic algorithm is fine.
Upvotes: 2
Views: 95
Reputation: 159185
Wrap the int[]
in a class that implements equals
and hashCode
, then build Map
of the wrapper class to instance count.
class IntArray {
private int[] array;
public IntArray(int[] array) {
this.array = array;
}
@Override
public int hashCode() {
return Arrays.hashCode(this.array);
}
@Override
public boolean equals(Object obj) {
return (obj instanceof IntArray && Arrays.equals(this.array, ((IntArray) obj).array));
}
@Override
public String toString() {
return Arrays.toString(this.array);
}
}
Test
int[][] input = {{1,2,3},
{1,0,3},
{1,2,3},
{5,2,6},
{5,2,6},
{5,2,6}};
Map<IntArray, Long> map = Arrays.stream(input).map(IntArray::new)
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
map.entrySet().forEach(System.out::println);
Output
[1, 2, 3]=2
[1, 0, 3]=1
[5, 2, 6]=3
Note: The above solution is faster and uses less memory than solution by Ravindra Ranwala, but it does require the creation of an extra class, so it is debatable which is better.
For smaller arrays, use the simpler solution below by Ravindra Ranwala.
For larger arrays, the above solution is likely better.
Map<List<Integer>, Long> map = Stream.of(input) .map(a -> Arrays.stream(a).boxed().collect(Collectors.toList())) .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
Upvotes: 4
Reputation: 18838
If the sequence of elements for all duplication of that array is like each other and the length of each array is not much, you can map each array to an int
number and using from last part of your method. Although this method decrease the time of hashing, there are some assumptions here which might not be true for your case.
Upvotes: 0
Reputation: 21124
You may do it like so,
Map<List<Integer>, Long> result = Stream.of(source)
.map(a -> Arrays.stream(a).boxed().collect(Collectors.toList()))
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
And here's the output,
{[1, 2, 3]=2, [1, 0, 3]=1, [5, 2, 6]=3}
Upvotes: 3