pp492
pp492

Reputation: 551

Closing channel when all workers have finished

I am implementing a web crawler and I have a Parse function that takes an link as an input and should return all links contained in the page.

I would like to make the most of go routines to make it as fast as possible. To do so, I want to create a pool of workers.

I set up a channel of strings representing the links links := make(chan string) and pass it as an argument to the Parse function. I want the workers to communicate through a unique channel. When the function starts, it takes a link from links, parse it and **for each valid link found in the page, add the link to links.

func Parse(links chan string) {
  l := <- links
  // If link already parsed, return
  for url := newUrlFounds {
    links <- url
  }
}

However, the main issue here is to indicate when no more links have been found. One way I thought of doing it was to wait before all workers have completed. But I don't know how to do so in Go.

Upvotes: 2

Views: 1069

Answers (1)

Peter
Peter

Reputation: 31691

As Tim already commented, don't use the same channel for reading and writing in a worker. This will deadlock eventually (even if buffered, because Murphy).

A far simpler design is simply launching one goroutine per URL. A buffered channel can serve as a simple semaphore to limit the number of concurrent parsers (goroutines that don't do anything because they are blocked are usually negligible). Use a sync.WaitGroup to wait until all work is done.

package main

import (
    "sync"
)

func main() {
    sem := make(chan struct{}, 10) // allow ten concurrent parsers
    wg := &sync.WaitGroup{}

    wg.Add(1)
    Parse("http://example.com", sem, wg)

    wg.Wait()
    // all done
}

func Parse(u string, sem chan struct{}, wg *sync.WaitGroup) {
    defer wg.Done()

    sem <- struct{}{}        // grab
    defer func() { <-sem }() // release

    // If URL already parsed, return.

    var newURLs []string

    // ...

    for u := range newURLs {
        wg.Add(1)
        go Parse(u)
    }
}

Upvotes: 3

Related Questions