Reputation: 823
I have a Go program which writes strings into a file.I have a loop which is iterated 20000 times and in each iteration i am writing around 20-30 strings into a file. I just wanted to know which is the best way to write it into a file.
Approach 1: Keep open the file pointer at the start of the code and write it for every string. It makes it 20000*30 write operations.
Approach 2: Use bytes.Buffer Go and store everything in the buffer and write it at the end.Also in this case should the file pointer be opened from the beginning of the code or at the end of the code. Does it matter?
I am assuming approach 2 should work better. Can someone confirm this with a reason. How does writing at once be better than writing periodically. Because the file pointer will anyways be open.
I am using f.WriteString(<string>)
and buffer.WriteString(<some string>)
buffer is of type bytes.Buffer
and f
is the file pointer open.
Upvotes: 5
Views: 4587
Reputation: 9569
bufio package has been created exactly for this kind of task. Instead of making a syscall for each Write call bufio.Writer
buffers up to a fixed number of bytes in the internal memory before making a syscall. After a syscall the internal buffer is reused for the next portion of data
Comparing to your second approach bufio.Writer
N/S
instead of 1
)S
bytes instead of N
bytes)where S
- is buffer size (can be specified via bufio.NewWriterSize
), N
- total size of data that needs to be written.
Example usage (https://play.golang.org/p/AvBE1d6wpT):
f, err := os.Create("file.txt")
if err != nil {
log.Fatal(err)
}
defer f.Close()
w := bufio.NewWriter(f)
fmt.Fprint(w, "Hello, ")
fmt.Fprint(w, "world!")
err = w.Flush() // Don't forget to flush!
if err != nil {
log.Fatal(err)
}
Upvotes: 8
Reputation: 12908
Syscalls are not cheap, so the second approach is better.
You can use lat_syscall tool from lmbench to measure how long it takes to call single write
:
$ ./lat_syscall write
Simple write: 0.1522 microseconds
So, on my system it will take approximately 20000 * 0.15μs = 3ms extra time just to call write
for every string.
Upvotes: 1
Reputation: 9519
The operations that take time when writing in files are the syscalls and the disk I/O. The fact that the file pointer is open doesn't cost you anything. So naively, we could say that the second method is best.
Now, as you may know, you OS doesn't directly write into files, it uses an internal in-memory cache for files that are written and do the real I/O later. I don't know the exacts details of that, and generally speaking I don't need to.
What I would advise is a middle-ground solution: do a buffer for every loop iteration, and write this one N times. That way to cut a big part of the number of syscalls and (potentially) disk writes, but without consuming too much memory with the buffer (dependeing on the size of your strings, that my be a point to be taken into account).
I would suggest benchmarking for the best solution, but due to the caching done by the system, benchmarking disk I/O is a real nightmare.
Upvotes: 4