KD157
KD157

Reputation: 823

Write operation cost

I have a Go program which writes strings into a file.I have a loop which is iterated 20000 times and in each iteration i am writing around 20-30 strings into a file. I just wanted to know which is the best way to write it into a file.

I am assuming approach 2 should work better. Can someone confirm this with a reason. How does writing at once be better than writing periodically. Because the file pointer will anyways be open. I am using f.WriteString(<string>) and buffer.WriteString(<some string>) buffer is of type bytes.Buffer and f is the file pointer open.

Upvotes: 5

Views: 4587

Answers (3)

kostya
kostya

Reputation: 9569

bufio package has been created exactly for this kind of task. Instead of making a syscall for each Write call bufio.Writer buffers up to a fixed number of bytes in the internal memory before making a syscall. After a syscall the internal buffer is reused for the next portion of data

Comparing to your second approach bufio.Writer

  • makes more syscalls (N/S instead of 1)
  • uses less memory (S bytes instead of N bytes)

where S - is buffer size (can be specified via bufio.NewWriterSize), N - total size of data that needs to be written.

Example usage (https://play.golang.org/p/AvBE1d6wpT):

f, err := os.Create("file.txt")
if err != nil {
    log.Fatal(err)
}
defer f.Close()

w := bufio.NewWriter(f)
fmt.Fprint(w, "Hello, ")
fmt.Fprint(w, "world!")
err = w.Flush() // Don't forget to flush!
if err != nil {
    log.Fatal(err)
}

Upvotes: 8

max taldykin
max taldykin

Reputation: 12908

Syscalls are not cheap, so the second approach is better.

You can use lat_syscall tool from lmbench to measure how long it takes to call single write:

$ ./lat_syscall write
Simple write: 0.1522 microseconds

So, on my system it will take approximately 20000 * 0.15μs = 3ms extra time just to call write for every string.

Upvotes: 1

Elwinar
Elwinar

Reputation: 9519

The operations that take time when writing in files are the syscalls and the disk I/O. The fact that the file pointer is open doesn't cost you anything. So naively, we could say that the second method is best.

Now, as you may know, you OS doesn't directly write into files, it uses an internal in-memory cache for files that are written and do the real I/O later. I don't know the exacts details of that, and generally speaking I don't need to.

What I would advise is a middle-ground solution: do a buffer for every loop iteration, and write this one N times. That way to cut a big part of the number of syscalls and (potentially) disk writes, but without consuming too much memory with the buffer (dependeing on the size of your strings, that my be a point to be taken into account).

I would suggest benchmarking for the best solution, but due to the caching done by the system, benchmarking disk I/O is a real nightmare.

Upvotes: 4

Related Questions