Reputation: 16774
Chunk size is one of the key design parameters. We have chosen 64 MB, which is much larger than typical file sys- tem block sizes. Each chunk replica is stored as a plain Linux file on a chunkserver and is extended only as needed. Lazy space allocation avoids wasting space due to internal fragmentation, perhaps the greatest objection against such a large chunk size.
What is lazy space allocation and how is it going to solve the internal fragmentation problem?
A small file consists of a small number of chunks, perhaps just one. The chunkservers storing those chunks may become hot spots if many clients are accessing the same file ... We fixed this problem by storing such executables with a higher replication factor and by making the batch- queue system stagger application start times.
What is staggering application start times and how does it avoid chunk-servers from becoming hot-spots?
Upvotes: 5
Views: 1447
Reputation: 7095
Lazy space allocation means the filesystem doesn't actually give the file space before it's written. They're commonly referred to as sparse files. For example, if only the first 2MB of the 64MB chunk file is used, only 2MB will actually be used on disk.
Staggering application start times just means that they don't start everything at once. If every application needs to read a few configuration files stored in GFS upon startup, if they all start at the same time, there will be load problems. Spreading out the startup times alleviates this.
Upvotes: 6