XiaoMing
XiaoMing

Reputation: 11

code block in goroutine produces strange wrong results

i have a big N*1 name array
i am currently using goroutine to calculate the edit distance of a name among each other

the question is the results at [B] [C] are different, maybe like

ABC BCD 7
ABC BCD 3

there are 20000 records in names

var names []string

divide names into two chunks

nameCount := len(names)  
procs := 2  
chunkSize := nameCount / procs 

channel

ch := make(chan int)  
var wg sync.WaitGroup


for i := 0; i < procs; i++ { //create two goroutines
    start := i * chunkSize
    end := (i+1)*chunkSize - 1
    fmt.Println(start, end) //get slice start and end
    wg.Add(1)
    go func(slices []string, allnames []string) {
        for _, slice := range slices {
            minDistance = 256
            distance := 0
            sum := 0
            for _, name := range allnames {
                distance = calcEditDist(slice, name) //get the LD [A]
                sum += 1
                if distance > 0 && distance < minDistance {
                    minDistance = distance
                    fmt.Println(slice, name, distance) //[B]
                    fmt.Println(slice, name, calcEditDist(slice, name)) //[C]
                } else if distance == minDistance {
                    fmt.Println(slice, name, distance)
                    fmt.Println(slice, name, calcEditDist(slice, name))
                }
            }
            // for _, name := range allnames {
            //  fmt.Println(slice, name)
            // }
            ch <- sum
            // fmt.Println(len(allnames), slice)
            break
        }
        wg.Done()
    }(names[start:end], names)
}

i placed the calcEditDist @https://github.com/copywrite/keyboardDistance/blob/master/parallel.go

PS:
if i declare

var dp [max][max]int

in calcEditDist as local variable instead of global, the results are right, but is incredibly slow

UPDATE 1
Thanks all buddy,i take the great advice below in three steps
1) i shrinked the dp to a much reasonable size, like 100 or even smaller, DONE
2) i put the dp declaration in each goroutine and pass its pointer as Nick said, DONE
3) later i will try to dynamically alloc dp, LATER

the performance improved steeply, ╰(°▽°)╯

Upvotes: 1

Views: 192

Answers (1)

Nick Craig-Wood
Nick Craig-Wood

Reputation: 54079

As you've identified in your posting, having dp as a global variable is the problem.

Allocating it each time in CalcEditDistance is too slow.

You have two possible solutions.

1) you only need 1 dp array per go-routine, so allocate it in the for loop loop and pass a pointer to it (don't pass the array directly as arrays pass by value which will involve a lot of copying!)

for i := 0; i < procs; i++ { //create two goroutines
    start := i * chunkSize
    end := (i+1)*chunkSize - 1
    fmt.Println(start, end) //get slice start and end
    wg.Add(1)
    go func(slices []string, allnames []string) {
        var dp [max][max]int // allocate
        for _, slice := range slices {
            minDistance = 256
            distance := 0
            sum := 0
            for _, name := range allnames {
                distance = calcEditDist(slice, name, &dp) // pass dp pointer here

Change calcEditDist to take the dp

func CalcEditDist(A string, B string, dp *[max][max]int) int {
        lenA := len(A)
        lenB := len(B)

2) Re-write your calcEditDistance so it doesn't need the massive O(N^2) dp array.

If you study the function carefully it only ever accesses a row up and a column to the left, so all the storage you actually need is a previous row and a previous columns which you could allocate dynamically at very little cost. This would make it scale to any length of string too.

That would need a bit of careful thought though!

Upvotes: 1

Related Questions