openwonk
openwonk

Reputation: 15577

Why do sequential loops run faster than concurrent methods in Go?

Scenario: I want to read a large text file fast (e.g. lorem.txt with 4.5 millon lines in example below) . I have tried three different ways in code below.

  1. Non-concurrent sequential processing
  2. Channels and goroutines
  3. Waitgroup and goroutines

Pseudo-Benchmarks: Here is a typical output I get when I run this below; output below is quick and dirty time deltas and not full-fledged profiling / bench marking / testing.

sequential:     43.541828091 secs 4714074 lines
queued channel: 80.986544385 secs 4714074 lines
wait group:     260.200473751 secs 4712266 lines

Question: Why are sequential loops faster than two other methods below? Am I missing something?

Update I ran the example code on a much bigger text file (see Scenario). I also "reset" the logger for each example just in case not doing so would have some compounding memory issue between the example functions. Also as others have noted, my computer is dual core, which may be one of many issues with my code. Thanks for all feedback / answers.

package main

import (
    "bufio"
    "bytes"
    "fmt"
    "log"
    "os"
    "sync"
    "time"
)

var (
    textFile = "lorem.txt"
    buf      bytes.Buffer
    l        = log.New(&buf, "logger: ", log.Lshortfile)
    wg       sync.WaitGroup
    delta1   float64
    delta2   float64
    delta3   float64
    cnt1     = 0
    cnt2     = 0
    cnt3     = 0
)

func main() {

    // Wait Group Example
    exampleWaitGroup()

    // Queued Channel Example
    exampleQueuedChannel()

    // Sequential Loop Example
    exampleSequentialLoop()

    benchmarks := fmt.Sprintf("sequential:\t%v secs %v lines\nqueued channel:\t%v secs %v lines\nwait group:\t%v secs %v lines\n",
        delta1, cnt1,
        delta2, cnt2,
        delta3, cnt3,
    )

    fmt.Println(benchmarks)

}

func exampleSequentialLoop() {

    buf.Reset()
    l = log.New(&buf, "logger: ", log.Lshortfile)

    start := time.Now()

    file1, err := os.Open(textFile)
    if err != nil {
        log.Fatal(err)
    }

    defer file1.Close()

    scanner := bufio.NewScanner(file1)

    for scanner.Scan() {
        cnt1++
        l.Println(scanner.Text())
    }

    end := time.Now()
    delta1 = end.Sub(start).Seconds()

}

func exampleQueuedChannel() {

    buf.Reset()
    l = log.New(&buf, "logger: ", log.Lshortfile)

    start := time.Now()
    queue := make(chan string)
    done := make(chan bool)

    go processQueue(queue, done)

    file2, err := os.Open(textFile)
    if err != nil {
        log.Fatal(err)
    }

    defer file2.Close()

    scanner := bufio.NewScanner(file2)

    for scanner.Scan() {
        queue <- scanner.Text()
    }

    end := time.Now()
    delta2 = end.Sub(start).Seconds()
}

func exampleWaitGroup() {

    buf.Reset()
    l = log.New(&buf, "logger: ", log.Lshortfile)

    start := time.Now()

    file3, err := os.Open(textFile)
    if err != nil {
        log.Fatal(err)
    }

    defer file3.Close()

    scanner := bufio.NewScanner(file3)

    for scanner.Scan() {
        wg.Add(1)
        go func(line string) {
            defer wg.Done()
            l.Println(line)
            cnt3++
        }(scanner.Text())
    }

    wg.Wait()

    end := time.Now()
    delta3 = end.Sub(start).Seconds()

}

func processQueue(queue chan string, done chan bool) {
    for line := range queue {
        l.Println(line)
        cnt2++
    }
    done <- true
}

Upvotes: 3

Views: 1669

Answers (3)

Siddharth Shishulkar
Siddharth Shishulkar

Reputation: 131

I don't think there should be a performance gain. In a producer/consumer scenario, introducing concurrency(for consumers) only makes sense if your consumer is slower than the producer. You should expect a performance gain the that case by introducing multiple consumers.

But here the consumer is already significantly faster than the producer(the IO) hence no performance gain.

Upvotes: 1

leaf bebop
leaf bebop

Reputation: 8232

Before answering the question, I would like to point out that your method of benchmarking is problematic. I would not say a 200-line file is large or sufficent for purpose of benchmarking. And there is an "official" way of benchmarking in Go, which you can read in docs of testing.

There is a Go Idiom, probably the most famouse one, saying: Concurrency is not Parallelism. What most expects to make program run faster is parallelism, not concurrency. In fact, on a single-core CPU, where parallelism is not possible, concurrency usually makes thing slower, because switching between goroutines (threads, coroutine, whatsoever), has costs.

In your code, it is much like the situation of single-core. There is not much parellelism, but much switching between goroutines. Furthermore, fmt.Println includes an IO operation, and that operation requires synchronizing, which cannot bennefit from parellelism.

Upvotes: 2

phemmer
phemmer

Reputation: 8846

In exampleQueuedChannel you're not doing anything in parallel. Yes you've launched another goroutine, but there is no parallel processing. The reason being is that queue is an unbuffered chan. When you write to an unbuffered chan, the writer blocks until someone reads off it. So essentially you block while writing, then the scheduler has to put the goroutine to sleep, and go wake up the reading goroutine. Then that one goes to sleep and the writer wakes back up. So you're thrashing between the two goroutines and the scheduler is getting a heavy workout.
If you want to get better performance here, use a buffered chan. And if you want even more performance, bulk up multiple items per chan message (For a long technical explanation of the performance impact of channels, read this).

In exampleWaitGroup, you're launching a new goroutine for every line. While launching new goroutines isn't expensive, it's not free either, and is also more work for the scheduler. defer is also not free. Also your logger uses a mutex, so if two of your goroutines try to log at the same time, one is going to be put to sleep, and again, more scheduler work.

You can investigate these matters on your own by launching your code under a profiler, and investigating where the bottlenecks lie.

Upvotes: 5

Related Questions