How to run concurrent jobs(actions) in Apache Spark using single spark context

Question

It says in Apache Spark documentation "within each Spark application, multiple “jobs” (Spark actions) may be running concurrently if they were submitted by different threads". Can someone explain how to achieve this concurrency for the following sample code?

    SparkConf conf = new SparkConf().setAppName("Simple_App");
    JavaSparkContext sc = new JavaSparkContext(conf);

    JavaRDD file1 = sc.textFile("/path/to/test_doc1");
    JavaRDD file2 = sc.textFile("/path/to/test_doc2");

    System.out.println(file1.count());
    System.out.println(file2.count());

These two jobs are independent and must run concurrently.
Thank You.

G Quintana · Accepted Answer

Try something like this:

    final JavaSparkContext sc = new JavaSparkContext("local[2]","Simple_App");
    ExecutorService executorService = Executors.newFixedThreadPool(2);
    // Start thread 1
    Future future1 = executorService.submit(new Callable() {
        @Override
        public Long call() throws Exception {
            JavaRDD file1 = sc.textFile("/path/to/test_doc1");
            return file1.count();
        }
    });
    // Start thread 2
    Future future2 = executorService.submit(new Callable() {
        @Override
        public Long call() throws Exception {
            JavaRDD file2 = sc.textFile("/path/to/test_doc2");
            return file2.count();
        }
    });
    // Wait thread 1
    System.out.println("File1:"+future1.get());
    // Wait thread 2
    System.out.println("File2:"+future2.get());

How to run concurrent jobs(actions) in Apache Spark using single spark context

Answers (2)

Related Questions