How to understand "hadoop is good for sequential data access"

Question

I tried to compare the performance about write a big file between local filesystem and HDFS. The result is sort of makes me confused. Time elapsed through write local is shorter than HDFS. I don't understand the concept "Hadoop is good for sequential data access"...

[root@datanodetest01 tmp]# dd if=/dev/zero of=testfile count=1 bs=256M
1+0 records in
1+0 records out
268435456 bytes (268 MB) copied, 0.324765 s, 827 MB/s

[root@datanodetest01 tmp]# time hadoop fs -put testfile /tmp

real    0m3.461s
user    0m6.829s
sys     0m0.666s

Gyanendra Dwivedi · Accepted Answer

HDFS comes with plethora of benefit, which is listed here and here

Please note that whether you store the data within local disk or on HDFS, ultimately you would like to dome some processing on the data. In that case, all the Big data technology stack leverage upon HDFS characteristic to provide a fast processing of data in a fault tolerant manner.

The difference in copying a data within local vs hdfs can simply be attributed to below facts:

1) HDFS makes at least 3 copies of data, so that it works into Highly available manner, no matter whether a machine is cluster goes kaput.

2) In HDFS, the data copies are maintained on different machine across cluster, hence some network I/O takes place.

Also note that - ref http://hadooptutorials.co.in/tutorials/hadoop/hadoop-fundamentals.html

Hadoop uses blocks to store a file or parts of a file. A Hadoop block is a file on the underlying filesystem. Since the underlying filesystem stores files as blocks, one Hadoop block may consist of many blocks in the underlying file system. Blocks are large. They default to 64 megabytes each and most systems run with block sizes of 128 megabytes or larger.

Hadoop is designed for streaming or sequential data access rather than random access. Sequential data access means fewer seeks, since Hadoop only seeks to the beginning of each block and begins reading sequentially from there.

A good read is given - Hadoop sequential data access

How to understand "hadoop is good for sequential data access"

Answers (2)

Related Questions

How to understand &quot;hadoop is good for sequential data access&quot;

Answers (2)

Related Questions

How to understand "hadoop is good for sequential data access"