nohup
nohup

Reputation: 3165

Does bufio.NewScanner in Golang reads the entire file in memory instead of a line each?

I was trying to read a file line by line with the following function using bufio.NewScanner.

func TailFromStart(fd *os.File, wg *sync.WaitGroup)  {

    fd.Seek(0,0)
    scanner := bufio.NewScanner(fd)
    for scanner.Scan() {
        line := scanner.Text()
        offset, _ := fd.Seek(0, 1)
        fmt.Println(offset)
        fmt.Println(line)
        offsetreset, _ := fd.Seek(offset, 0)
        fmt.Println(offsetreset)
    }
    offset, err := fd.Seek(0, 1)
    CheckError(err)
    fmt.Println(offset)
    wg.Done()

}

I was expecting it to print offset in increasing order, however, it is printing the same value in each iteration until the file reaches EOF.

127.0.0.1 - - [11/Aug/2016:22:10:39 +0530] "GET /ttt HTTP/1.1" 404 437 "-" "curl/7.38.0"
613
613
127.0.0.1 - - [11/Aug/2016:22:10:42 +0530] "GET /qqq HTTP/1.1" 404 437 "-" "curl/7.38.0"
613

613 is the total number of characters in the file.

cat /var/log/apache2/access.log | wc
  7      84     613

Am I understanding it wrong, or does bufio.NewScanner reads the entire file in memory, and iterates over that in-memory? If so, is there a better way to read line-by-line?

Upvotes: 4

Views: 4070

Answers (2)

Albi
Albi

Reputation: 1865

You can increase your buffer size of your scanner

eg:-

scanner := bufio.NewScanner(file)
buf := make([]byte, 0, 64*1024)
scanner.Buffer(buf, 1024*1024) //1024*1024 => 1mb max (you can change value here to read larger files
for scanner.Scan() {
    // do your stuff
}

Upvotes: 1

user6169399
user6169399

Reputation:

see func (s *Scanner) Buffer(buf []byte, max int) Docs:

Buffer sets the initial buffer to use when scanning and the maximum size of buffer that may be allocated during scanning. The maximum token size is the larger of max and cap(buf).
If max <= cap(buf), Scan will use this buffer only and do no allocation.

By default, Scan uses an internal buffer and sets the maximum token size to MaxScanTokenSize.

Buffer panics if it is called after scanning has started.

And:

MaxScanTokenSize is the maximum size used to buffer a token unless the user provides an explicit buffer with Scan.Buffer. The actual maximum token size may be smaller as the buffer may need to include, for instance, a newline.

MaxScanTokenSize = 64 * 1024

startBufSize = 4096 // Size of initial allocation for buffer.

No, as @JimB said it reads only buffer size, see this test sample:

For smaller than 4096 bytes it reads all file content to the buffer,
but for big files just reads 4096 bytes,
try this with big files:

package main

import (
    "bufio"
    "fmt"
    "os"
)

func main() {
    fd, err := os.Open("big.txt")
    if err != nil {
        panic(err)
    }
    defer fd.Close()

    n, err := fd.Seek(0, 0)
    if err != nil {
        panic(err)
    }
    fmt.Println("n =", n) // 0

    scanner := bufio.NewScanner(fd)
    for scanner.Scan() {
        fmt.Println(scanner.Text())
        break
    }

    offset, err := fd.Seek(0, 1)
    if err != nil {
        panic(err)
    }
    fmt.Println("offset =", offset) //4096

    offsetreset, err := fd.Seek(offset, 0)
    if err != nil {
        panic(err)
    }
    fmt.Println("offsetreset =", offsetreset) //4096

    offset, err = fd.Seek(0, 1)
    if err != nil {
        panic(err)
    }
    fmt.Println("offset =", offset) //4096

}

output:

n = 0

offset = 4096
offsetreset = 4096
offset = 4096

Upvotes: 4

Related Questions