Reputation: 11
i have a big N*1 name array
i am currently using goroutine to calculate the edit distance of a name among each other
the question is the results at [B] [C] are different, maybe like
ABC BCD 7
ABC BCD 3
there are 20000 records in names
var names []string
divide names into two chunks
nameCount := len(names)
procs := 2
chunkSize := nameCount / procs
channel
ch := make(chan int)
var wg sync.WaitGroup
for i := 0; i < procs; i++ { //create two goroutines
start := i * chunkSize
end := (i+1)*chunkSize - 1
fmt.Println(start, end) //get slice start and end
wg.Add(1)
go func(slices []string, allnames []string) {
for _, slice := range slices {
minDistance = 256
distance := 0
sum := 0
for _, name := range allnames {
distance = calcEditDist(slice, name) //get the LD [A]
sum += 1
if distance > 0 && distance < minDistance {
minDistance = distance
fmt.Println(slice, name, distance) //[B]
fmt.Println(slice, name, calcEditDist(slice, name)) //[C]
} else if distance == minDistance {
fmt.Println(slice, name, distance)
fmt.Println(slice, name, calcEditDist(slice, name))
}
}
// for _, name := range allnames {
// fmt.Println(slice, name)
// }
ch <- sum
// fmt.Println(len(allnames), slice)
break
}
wg.Done()
}(names[start:end], names)
}
i placed the calcEditDist @https://github.com/copywrite/keyboardDistance/blob/master/parallel.go
PS:
if i declare
var dp [max][max]int
in calcEditDist as local variable instead of global, the results are right, but is incredibly slow
UPDATE 1
Thanks all buddy,i take the great advice below in three steps
1) i shrinked the dp to a much reasonable size, like 100 or even smaller, DONE
2) i put the dp declaration in each goroutine and pass its pointer as Nick said, DONE
3) later i will try to dynamically alloc dp, LATER
the performance improved steeply, ╰(°▽°)╯
Upvotes: 1
Views: 192
Reputation: 54079
As you've identified in your posting, having dp
as a global variable is the problem.
Allocating it each time in CalcEditDistance
is too slow.
You have two possible solutions.
1) you only need 1 dp
array per go-routine, so allocate it in the for loop loop and pass a pointer to it (don't pass the array directly as arrays pass by value which will involve a lot of copying!)
for i := 0; i < procs; i++ { //create two goroutines
start := i * chunkSize
end := (i+1)*chunkSize - 1
fmt.Println(start, end) //get slice start and end
wg.Add(1)
go func(slices []string, allnames []string) {
var dp [max][max]int // allocate
for _, slice := range slices {
minDistance = 256
distance := 0
sum := 0
for _, name := range allnames {
distance = calcEditDist(slice, name, &dp) // pass dp pointer here
Change calcEditDist to take the dp
func CalcEditDist(A string, B string, dp *[max][max]int) int {
lenA := len(A)
lenB := len(B)
2) Re-write your calcEditDistance
so it doesn't need the massive O(N^2) dp array.
If you study the function carefully it only ever accesses a row up and a column to the left, so all the storage you actually need is a previous row and a previous columns which you could allocate dynamically at very little cost. This would make it scale to any length of string too.
That would need a bit of careful thought though!
Upvotes: 1