Reputation: 356
I'm having some issues optimizing a go map.
I want to generate a frequency table (count distinct occurrences) in an array of strings. My code holds nicely for small arrays, but as I start working with 100k+ structures -with many distinct values- it just isn't performant enough.
Right now, my approach is to generate an array with the distinct values, compare values and increasing the counter variable (mapped to the string).
counter := make( map[string]int )
for _, distinct := range distinctStrArray{
for _, row := range StrArray{
if (row == distinct){
counter[distinct]++
}
}
}
I've tried another approach, where with the input array previously sorted (to minimize the number of changes to the map). This is a bit faster.
count:=0
for _, distinct := range distinctStrArray{
for _, row := range StrArray{
if (row == distinct){
count++
}
}
counter[distinct] += count
count= 0
}
Do you have any suggestion of what I can do to optimize a simple count(distinct) type problem...? I'm open to anything.
thanks!
Upvotes: 1
Views: 6448
Reputation: 46542
Without more context, I would dump the separate array of distinct values - generating it takes time, and using it necessitates the nested loop. Assuming there's no other purpose to the second array, I'd use something like:
counter := make( map[string]int )
for _, row := range StrArray {
counter[row]++
}
If you need the list of distinct strings without the counts for some separate purpose, you can easily get it afterward:
distinctStrings := make([]string, len(counter))
i := 0
for k := range counter {
distinctStrings[i] = k
i++
}
Iterating the array of distinct strings is O(n), while map access by key is O(log(n)). That takes your overall from O(n^2) to O(n*log(n)), which should be a significant improvement with larger datasets. But, as with any optimization: test, measure, analyze, optimize.
Upvotes: 9