JGut
JGut

Reputation: 532

Reading bytes from a file concurrently

I've written a program in Go that reads a single byte from a file and checks to see which bits are set. These files are usually pretty large (around 10 - 100 GB), so I don't want to read the entire file into memory. The program normally has to check millions of separate bytes.

Right now, the way I'm performing these reads is by using os.File.ReadAt(). This ended up being pretty slow, so I tried to use Goroutines to speed it up. For example:

var wg sync.WaitGroup
threadCount := 8

for i := 0; i < threadCount; i += 1 {
    wg.Add(1)
    go func(id int) {
        defer wg.Done()
        index := id
        myByte := make([]byte, 1)

        for index < numBytesInFile-1 {  // Stop when thread would attempt to read byte outside of file
            fmt.Println(file.ReadAt(myByte, index))
            index += threadCount
        }
    }(i)
}
wg.Wait()

However, using Goroutines here didn't speed the program up at all (in fact, it made it slightly slower due to overhead). I would have thought that files on the disc could be read concurrently as long as they are opened in read-only mode (which I do in my program). Is what I'm asking for impossible, or is there some way I make concurrent reads to a file in Go?

Upvotes: 2

Views: 2084

Answers (1)

Sankar
Sankar

Reputation: 6541

You slowness is because of I/O and not CPU. Adding more threads will not speed up your program. Read about Amdahl's law. https://en.wikipedia.org/wiki/Amdahl%27s_law

If you do not want to read the full file into memory, you could either use a buffered reader and read in parts https://golang.org/pkg/bufio/#NewReader or you could even consider using the experimental memory-mapped files package too: https://godoc.org/golang.org/x/exp/mmap

To know more about memory mapped files, see https://en.wikipedia.org/wiki/Memory-mapped_file

Upvotes: 4

Related Questions