Adam Woolhether
Adam Woolhether

Reputation: 81

Improving performance of reading with bufio.NewScanner

A simple program to serve one purpose:

  1. Read a script file line by line, create a string while ignoring any blank new lines or comments (including the shebang). Adding a ';' at the end of a line if needed. (I know, I know, backslashes and ampersands, etc)

My question is:

How to improve the performance of this small program? In a different answer I've read about utilizing scanner.Bytes() instead of scanner.Text(), but this doesn't seem feasible as a string is what I want.

Sample code with test file: https://play.golang.org/p/gzSTLkP3BoB

Here is the simple program:

func main() {
    file, err := os.Open("./script.sh")
    if err != nil {
        log.Fatalln(err)
    }
    defer file.Close()

    var a strings.Builder
    scanner := bufio.NewScanner(file)

    for scanner.Scan() {
        lines := scanner.Text()

        switch {
        case lines == "" || lines[:1] == "#":
            continue
        case lines[len(lines)-1:] != ";":
            a.WriteString(lines + "; ")
        default:
            a.WriteString(lines + " ")
        }
    }

    fmt.Println(a.String())
}

Upvotes: 2

Views: 1103

Answers (3)

Benny Siegert
Benny Siegert

Reputation: 99

You may be able to improve performance by buffering the output as well.

func main() {
  output := bufio.NewWriter(os.Stdout)

  // instead of Printf, use
  fmt.Fprintf(output, "%s\n", a)
}

Upvotes: 0

shmsr
shmsr

Reputation: 4204

I used strings.Builder and ioutil.ReadAll to improve the performance. As you are dealing with small shell scripts I assumed that read the file all at once should not put pressure on memory (I used ioutil.ReadAll). I also allocated just once to make sufficient store for strings.Builder — reduced allocations.

  • doFast: faster implementation
  • doSlow: slower implementation (what you've originally done)

Now, let's look at the benchmark results:

goos: darwin
goarch: amd64
pkg: test
cpu: Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz
BenchmarkDoFast-8         342602          3334 ns/op        1280 B/op          3 allocs/op
BenchmarkDoSlow-8         258896          4408 ns/op        4624 B/op          8 allocs/op
PASS
ok      test    2.477s

We can see that doFast is not only faster but only makes lesser allocations. Metrics measured are lower the better.

package main

import (
    "bufio"
    "bytes"
    "fmt"
    "io/ioutil"
    "os"
    "strings"
)

func open(filename string) (*os.File, error) {
    return os.Open(filename)
}

func main() {
    fd, err := open("test.sh")
    if err != nil {
        panic(err)
    }
    defer fd.Close()

    outputA, err := doFast(fd)
    if err != nil {
        panic(err)
    }

    fd.Seek(0, 0)
    outputB, err := doSlow(fd)
    if err != nil {
        panic(err)
    }

    fmt.Println(outputA)
    fmt.Println(outputB)
}

func doFast(fd *os.File) (string, error) {
    b, err := ioutil.ReadAll(fd)
    if err != nil {
        return "", err
    }

    var res strings.Builder
    res.Grow(len(b))

    bLines := bytes.Split(b, []byte("\n"))

    for i := range bLines {
        switch {
        case len(bLines[i]) == 0 || bLines[i][0] == '#':
        case bLines[i][len(bLines[i])-1] != ';':
            res.Write(bLines[i])
            res.WriteString("; ")
        default:
            res.Write(bLines[i])
            res.WriteByte(' ')
        }
    }

    return res.String(), nil
}

func doSlow(fd *os.File) (string, error) {
    var a strings.Builder
    scanner := bufio.NewScanner(fd)

    for scanner.Scan() {
        lines := scanner.Text()

        switch {
        case lines == "" || lines[:1] == "#":
            continue
        case lines[len(lines)-1:] != ";":
            a.WriteString(lines + "; ")
        default:
            a.WriteString(lines + " ")
        }
    }

    return a.String(), nil
}

Note: I didn't use bufio.NewScanner; is it required?

Upvotes: 3

Thundercat
Thundercat

Reputation: 121009

It is feasible to use scanner.Bytes(). Here's the code:

func main() {
    file, err := os.Open("./script.sh")
    if err != nil {
        log.Fatalln(err)
    }
    defer file.Close()

    var a strings.Builder
    scanner := bufio.NewScanner(file)

    for scanner.Scan() {
        lines := scanner.Bytes()

        switch {
        case len(lines) == 0 || lines[0] == '#':
            continue
        case lines[len(lines)-1] != ';':
            a.Write(lines)
            a.WriteString("; ")
        default:
            a.Write(lines)
            a.WriteByte(' ')
        }
    }

    fmt.Println(a.String())
}

This program avoids the string allocation in scanner.Text(). The program may not be faster in practice if the program speed is limited by I/O.

Run it on the playground.

If your goal is to write the result to stdout, then write to a bufio.Writer instead of a strings.Builder. This change replaces one or more allocations in strings.Builder with a single allocation in bufio.Writer.

func main() {
    file, err := os.Open("./script.sh")
    if err != nil {
        log.Fatalln(err)
    }
    defer file.Close()

    a := bufio.NewWriter(os.Stdout)
    defer a.Flush() // flush buffered data on return from main.

    scanner := bufio.NewScanner(file)

    for scanner.Scan() {
        lines := scanner.Bytes()

        switch {
        case len(lines) == 0 || lines[0] == '#':
            continue
        case lines[len(lines)-1] != ';':
            a.Write(lines)
            a.WriteString("; ")
        default:
            a.Write(lines)
            a.WriteByte(' ')
        }
    }
}

Run it on the playground.

Bonus improvement: use lines := bytes.TrimSpace(scanner.Bytes()) to handle whitespace before a '#' and after a ';'

Upvotes: 1

Related Questions