naman wats
naman wats

Reputation: 41

Google File System block size

Why is block size in GFS 64Mb though it may lead to internal fragmentation when the file size is not in the multiple of 64?

Upvotes: 4

Views: 2381

Answers (2)

papalagi
papalagi

Reputation: 770

In their target applications, files are more likely to be large. Chunk server can perform a large sequential read rather than many small reads hence improve throughput.

Three reasons mentioned in the GFS paper:

  1. Lower the loading of master. The master server of GFS only provides metadata of the chunk instead of chunk content. Therefore, less requests will be sent to master server if the chunk is relative large.
  2. Reduce network overhead, it encourage applications to finish many operations on a single chunk and persistent network connection. Applications also get their data in fewer request.
  3. Reduce metadata size stored in the master. There is only one master server in GFS’s design. All metadata of chunks are stored in the memory of master server in order to reduce the latency and increase the throughput. Large chunks means less metadata, and less metadata means less load time of the metadata.

Besides, metadata is distributed on the chunk servers for locality concerns rather than the master server. When the master server starts up, it loads metadata from all chunk servers. Therefore, less metadata means less start-up time.

In order to limit the impact of large chunk size, GFS uses lazy space allocation, i.e. if a file is only 1 MB in size, GFS only asks 1 MB from the file system rather than 64 MB, to avoid wasting space due to internal fragmentation.

It may worths to mention that the successor of GFS, called Colossus, reduce the data chunk size from 64MB to 1MB.

Upvotes: 6

Novice
Novice

Reputation: 155

These systems are developed for handling larger files. In the same way by default HDFS uses 128 MB.

Upvotes: 0

Related Questions