roger
roger

Reputation: 9913

What does "Scan advances the Scanner to the next token" mean in Go's bufio.Scanner?

According to Scanner.scan documents, Scan() advances the Scanner to the next token, but what does that mean? I find that Scanner.Text and Scanner.Bytes can be different, which is puzzling.

This code doesn't always cause an error, but as the file becomes larger it does:

func TestScanner(t *testing.T) {
    path := "/tmp/test.txt"
    f, err := os.Open(path)
    if err != nil {
        panic(fmt.Sprint("failed to open ", path))
    }
    defer f.Close()
    scanner := bufio.NewScanner(f)

    bs := make([][]byte, 0)
    for scanner.Scan() {
        bs = append(bs, scanner.Bytes())
    }

    f, err = os.Open(path)
    if err != nil {
        panic(fmt.Sprint("failed to open ", path))
    }
    defer f.Close()
    scanner = bufio.NewScanner(f)
    ss := make([]string, 0)
    for scanner.Scan() {
        ss = append(ss, scanner.Text())
    }

    for i, b := range bs {
        if string(b) != ss[i] {
            t.Errorf("expect %s, got %s", ss[i], string(b))
        }
    }
}

Upvotes: 1

Views: 509

Answers (1)

Thundercat
Thundercat

Reputation: 120999

The token is defined by the scanner's split function. Scan() returns when the split function finds a token or there's an error.

The String() and Bytes() methods both return the current token. The String() method returns a copy of the token. The Bytes() method does not allocate memory and returns a slice that may use a backing array that's overwritten on a subsequent call to Scan().

Copy the slice returned from Bytes() to avoid this issue:

for scanner.Scan() {
    bs = append(bs, append([]byte(nil), scanner.Bytes()...))
}

Upvotes: 4

Related Questions