Reputation: 171
I'm trying to parallelize my quadtree lookup by using go-routines to share the recursive lookup.
My Quadtree struct looks like this:
type Quadtree struct {
Rectangle //Boundary of the quadtree
Points []Point
Ne *Quadtree
Se *Quadtree
Sw *Quadtree
Nw *Quadtree
Divided bool
Capacity int
Parent *Quadtree
}
I have a method called QueryForNearestPoints
which accepts a Point
and a searchRadius
and returns all the points within the searcharea
in the Quadtree. If no points are found within the given searchRadius
, then the searchRadius
is increased and tried again.
points
is the channel to which points in the search-radius are send to. I'm also using a sync.WaitGroup
wg
to wait for all go-routines to finish.
func (q *Quadtree) QueryForNearestPoints(p Point, searchRadius float64) []QueryResult {
searcharea = Circle{p, searchRadius}
var testResults []QueryResult
for {
go func() {
loop:
for {
select {
case item, _ := <-points:
testResults = append(testResults, item)
case <-done:
break loop
}
}
}()
wg.Add(1)
go q.query(searcharea)
wg.Wait()
done <- true
if len(testResults) > 0 {
break
} else {
// Proportionally increase the search radius if no points are found.
searcharea = Circle{p, searcharea.Radius + searcharea.Radius*50/100}
}
}
return testResults
}
The parallelized quadtree's query method:
func (q *Quadtree) query(area Circle) {
defer wg.Done()
if !q.overlapsCircle(area) {
return
}
if len(q.Points) > 0 {
for i, point := range q.Points {
if area.contains(point) {
points <- QueryResult{point, i, q}
}
}
}
if q.Divided {
wg.Add(4)
go q.Ne.parallelquery(area)
go q.Se.parallelquery(area)
go q.Sw.parallelquery(area)
go q.Nw.parallelquery(area)
}
}
(q *Quadtree) parallelquery
method:
func (q *Quadtree) parallelquery(area Circle) {
semaphore <- struct{}{}
defer func() {
<-semaphore
}()
q.query(area)
}
If I'm not wrong this is a cpu-bound workload so having 1000's of go-routines isn't going to help, so I'm using a buffered channel semaphore
as a counting semaphore to make sure that at any point only n number of go-routines are active.(Here I made n as 4 because my computer has 4 cpus.)
The only problem is that this approach is significantly slower than the normal non-concurrent sequential approach. What am I missing here? How can i make this faster?
Upvotes: 1
Views: 434
Reputation: 5898
This is much slower because the work that you are doing in each of the goroutines is very cheap, and there are very many goroutines.
For those that will criticise the terminology in this answer, I am going to use parallelism and concurrency interchangeably. Even though strictly, parallelism is a property of the runtime / computer, and concurrency is a property of the code. Some code can still be concurrent if it is running on hardware with a single processor, but it will not be running in parallel.
The reason that the concurrent version here is slower is because there is still some synchronisation required when using channels etc. Under the hood, channels use a mutex to synchronise the sends / receives on a channel. This means that you are introducing contention between the goroutines.
Each time you call parallelquery(area)
you are starting 4 more goroutines that must be scheduled, but (using a semaphore channel) you've limited the number of concurrent goroutines to 4.
After a n recursions you've got 4^n goroutines all trying to receive a value from the semaphore channel, this causes a lot of contention.
Using a blocking CPU profiler on this you will probably find that most of the execution time is spent on contention attempting to get a token from the semaphore.
Upvotes: 2