how does apache spark process the non-rdd likes System.out, for, while?

Question

when i write the iterator(likes for for while) or non-rdd,

how really spark process non-rdd and how to sperate this?

lies this

public static void main(String[] args) {
    JavaSparkContext sc = ....;
    int sum=0;
    for(int i=0; 0<1000000; i++)
        sum+=i;
    sc.wrtieHadoop("/user/text/test.txt");
}

Thiago Baldim · Accepted Answer

This job is handle by the Driver, every block of code that is out of the framework Apache Spark will run in the Driver.

That is the reason that you need to understand how much memory you are going to use in your driver, because if you are going to do complex stuff or even if you are going to do a Collect for any reason that will be. All process will be done in the Driver. See the image below:

So everything that runs out Spark, as your code shows:

public static void main(String[] args) {
    JavaSparkContext sc = ....;
    int sum=0;
    for(int i=0; 0<1000000; i++)
        sum+=i;
    sc.wrtieHadoop("/user/text/test.txt");
}

Before the sc.writeHadoop this will run all in the driver. After it finish than the workers are going to be called.

how does apache spark process the non-rdd likes System.out, for, while?

Answers (1)

Related Questions