PeopleMoutainPeopleSea
PeopleMoutainPeopleSea

Reputation: 1512

Why this Go program is so slow?

I just read some short tutorials of Go and wrote a simple program sieve. Sieve uses sieve algorithm to print all the prime number that is smaller than 10000, which create a lot of go routines. I got the correct results but the program is very slow (5 seconds on my machine). I also wrote lua script and python script which implemented the same algorithm, and runs a lot faster (both are about 1 second on my machine).

Note that the purpose is to have idea of go routine's performance compared with coroutine in other languages, for example lua. The implementation is very inefficient, some comments pointed out that it's not correct way to implement Sieve of Eratosthenes. Yes, that's intentional. Some other replies pointed out that slowness is caused by print I/O. So I commented out print lines.

My question is why my sieve program implemented in Go is so slow? Here is the code:

package main

import (
  "fmt"
  "sync"
)


type Sieve struct {
  id int;
  msg_queue chan int;
  wg *sync.WaitGroup;
}


func NewSieve(id int) *Sieve {
  sieve := new(Sieve)
  sieve.id = id
  sieve.msg_queue = make(chan int)
  sieve.wg = new(sync.WaitGroup)
  sieve.wg.Add(1)
  return sieve
}


func (sieve *Sieve) run() {
  defer sieve.wg.Done()

  myprime := <-sieve.msg_queue
  if myprime == 0 {
    return
  }
  // fmt.Printf("Sieve (%d) is for prime number %d.\n", sieve.id, myprime)

  next_sieve := NewSieve(sieve.id + 1)
  go next_sieve.run()
  for {
    number := <-sieve.msg_queue
    if number == 0 {
      next_sieve.msg_queue <- number;
      next_sieve.wg.Wait()
      return
    } else if number % myprime != 0 {
      // fmt.Printf("id: %d, number: %d, myprime: %d, number mod myprime: %d\n", sieve.id, number, myprime, number % myprime)
      next_sieve.msg_queue <- number
    }
  }
}


func driver() {
  first := NewSieve(2)
  go first.run()
  for n := 2; n <= 10000; n++ {
    first.msg_queue <- n
  }
  first.msg_queue <- 0
  first.wg.Wait()
}


func main() {
  driver()
}

As a comparison, here is the code of sieve.lua

function sieve(id)
  local myprime = coroutine.yield()
  // print(string.format("Sieve (%d) is for prime number %d", id, myprime))

  local next_sieve = coroutine.create(sieve)
  coroutine.resume(next_sieve, id + 1)

  while true do
    local number = coroutine.yield()
    if number % myprime ~= 0 then
      // print(string.format("id: %d, number: %d, myprime: %d, number mod myprime: %d", id, number, myprime, number % myprime))
      coroutine.resume(next_sieve, number)
    end
  end
end


function driver()
  local first = coroutine.create(sieve)
  coroutine.resume(first, 2)
  local n
  for n = 2, 10000 do
    coroutine.resume(first, n)
  end
end

driver()

Upvotes: 0

Views: 1013

Answers (2)

peterSO
peterSO

Reputation: 166529

Meaningless microbenchmarks produce meaningless results.

You are timing print I/O.

You are incurring go routine and channel overhead for a small amount of work.


Here is a prime number sieve program in Go.

Output:

$ go version
go version devel +46be01f4e0 Sun Oct 13 01:48:30 2019 +0000 linux/amd64
$ go build sumprimes.go && time ./sumprimes
5736396
29.96µs   
real    0m0.001s
user    0m0.001s
sys     0m0.000s

sumprimes.go:

package main

import (
    "fmt"
    "time"
)

const (
    prime    = 0x00
    notprime = 0xFF
)

func oddPrimes(n uint64) (sieve []uint8) {
    sieve = make([]uint8, (n+1)/2)
    sieve[0] = notprime
    p := uint64(3)
    for i := p * p; i <= n; i = p * p {
        for j := i; j <= n; j += 2 * p {
            sieve[j/2] = notprime
        }
        for p += 2; sieve[p/2] == notprime; p += 2 {
        }
    }
    return sieve
}

func sumPrimes(n uint64) uint64 {
    sum := uint64(0)
    if n >= 2 {
        sum += 2
    }
    for i, p := range oddPrimes(n) {
        if p == prime {
            sum += 2*uint64(i) + 1
        }
    }
    return sum
}

func main() {
    start := time.Now()

    var n uint64 = 10000
    sum := sumPrimes(n)
    fmt.Println(sum)

    fmt.Println(time.Since(start))
}

Upvotes: 7

torek
torek

Reputation: 487735

Most of the time is spent in fmt.Printf.

Taking out the line:

fmt.Printf("id: %d, number: %d, myprime: %d, number mod myprime: %d\n", sieve.id, number, myprime, number%myprime)

reduces runtime from ~5.4 seconds to ~0.64 seconds on one test I ran.

Taking out the unnecessary sync.WaitGroups reduces the time a bit further, to ~0.48 seconds. See the version without sync.WaitGroup here. You're still doing a lot of channel operations, which languages with yield-value-from-coroutine operators do not need (though they have their own issues instead). This is not a good way to implement primality testing.

Upvotes: 4

Related Questions