Can I run Spark with segmented file in each slave node?

Question

Imagine I have two slaves and one master. Previously I have copied and pasted same data in all slave nodes.

JavaPairRDD seqVectors = sc.sequenceFile(inputPath, IntWritable.class,
            VectorWritable.class);

Here inputpath is not an HDFS path, rather a local path that each slave node has access to. But now I am considering a situation where each slave has partial data, and I want to use same code, without installing/working with HDFS. But problem is after running the same code, program runs without any error but does not produce any result. Because

The master has no data in the "inputPath".
The slaves has partial data in the "inputPath", but master didnot distribute any data from it to them to distribute workload.

My question is how can I run my program, without any third party program, in this new situation?

Alper t. Turker · Accepted Answer

You cannot. If you want to run Spark

without installing/working with HDFS

(or other distributed storage), you have to provide a full copy of data on each node, including driver. Obviously it is not something very useful in practice.

Can I run Spark with segmented file in each slave node?

Answers (1)

Related Questions