Difference between Spark RDDs and HDFS' data blocks

Question

Please help me to understand the difference between HDFS' data block and the RDDs in Spark. HDFS distributes a dataset to multiple nodes in a cluster as blocks with same size and data blocks will be replicated mutiple times and stored. RDDs are created as parallelized collection. Are the elements of the Parallelized collections distributed across nodes or it will be stored in memory for processing? Is there any relation to HDFS' data blocks?

Difference between Spark RDDs and HDFS' data blocks

Answers (1)

Related Questions

Difference between Spark RDDs and HDFS&#39; data blocks

Answers (1)

Related Questions

Difference between Spark RDDs and HDFS' data blocks