Reputation: 532
I've written a program in Go that reads a single byte from a file and checks to see which bits are set. These files are usually pretty large (around 10 - 100 GB), so I don't want to read the entire file into memory. The program normally has to check millions of separate bytes.
Right now, the way I'm performing these reads is by using os.File.ReadAt()
. This ended up being pretty slow, so I tried to use Goroutines to speed it up. For example:
var wg sync.WaitGroup
threadCount := 8
for i := 0; i < threadCount; i += 1 {
wg.Add(1)
go func(id int) {
defer wg.Done()
index := id
myByte := make([]byte, 1)
for index < numBytesInFile-1 { // Stop when thread would attempt to read byte outside of file
fmt.Println(file.ReadAt(myByte, index))
index += threadCount
}
}(i)
}
wg.Wait()
However, using Goroutines here didn't speed the program up at all (in fact, it made it slightly slower due to overhead). I would have thought that files on the disc could be read concurrently as long as they are opened in read-only mode (which I do in my program). Is what I'm asking for impossible, or is there some way I make concurrent reads to a file in Go?
Upvotes: 2
Views: 2084
Reputation: 6541
You slowness is because of I/O and not CPU. Adding more threads will not speed up your program. Read about Amdahl's law. https://en.wikipedia.org/wiki/Amdahl%27s_law
If you do not want to read the full file into memory, you could either use a buffered reader and read in parts https://golang.org/pkg/bufio/#NewReader or you could even consider using the experimental memory-mapped files package too: https://godoc.org/golang.org/x/exp/mmap
To know more about memory mapped files, see https://en.wikipedia.org/wiki/Memory-mapped_file
Upvotes: 4