user3086871
user3086871

Reputation: 721

Can I run Spark with segmented file in each slave node?

Imagine I have two slaves and one master. Previously I have copied and pasted same data in all slave nodes.

JavaPairRDD<IntWritable, VectorWritable> seqVectors = sc.sequenceFile(inputPath, IntWritable.class,
            VectorWritable.class);

Here inputpath is not an HDFS path, rather a local path that each slave node has access to. But now I am considering a situation where each slave has partial data, and I want to use same code, without installing/working with HDFS. But problem is after running the same code, program runs without any error but does not produce any result. Because

  1. The master has no data in the "inputPath".
  2. The slaves has partial data in the "inputPath", but master didnot distribute any data from it to them to distribute workload.

My question is how can I run my program, without any third party program, in this new situation?

Upvotes: 0

Views: 35

Answers (1)

Alper t. Turker
Alper t. Turker

Reputation: 35249

You cannot. If you want to run Spark

without installing/working with HDFS

(or other distributed storage), you have to provide a full copy of data on each node, including driver. Obviously it is not something very useful in practice.

Upvotes: 1

Related Questions