Cheeko
Cheeko

Reputation: 1223

Spark submit using mesos dcos cli

I'm trying to start a spark streaming job on mesos using the DCOS cli. I'm able to start the job. My program expects a config file to be passed as cli parameter. How do I achieve this with dcos spark run --submit-args?

I tried --files http://server/path/to//file hoping it will download files but that didn't work. Driver starts but fails because config file is missing.

I also tried to roll up the jar and config file as tar and submitted it. I can see in Mesos logs that the tar was fetched and untar. Both config and jar file are seen in the working directory. But job fails with ClassNotFoundException. I suspect something was not right about how spark-submit was started.

dcos spark run --submit-args="--supervise --deploy-mode cluster --class package.name.classname http://file-server:8000/Streaming.tar.gz Streaming.conf"

Any hint on how to proceed? Also, in which log file can I see the underlying spark-submit command used by DCOS?

Upvotes: 3

Views: 1089

Answers (2)

Andrey Dyatlov
Andrey Dyatlov

Reputation: 1668

Here is the example of a command you should launch in order to make it work:

dcos spark run --submit-args='--conf spark.mesos.uris=https://s3-us-west-2.amazonaws.com/andrey-so-36323287/pi.conf --class JavaSparkPiConf https://s3-us-west-2.amazonaws.com/andrey-so-36323287/sparkPi_without_config_file.jar /mnt/mesos/sandbox/pi.conf'

Where

--conf spark.mesos.uris=... A comma-separated list of URIs to be downloaded to the sandbox when driver or executor is launched by Mesos. This applies to both coarse-grained and fine-grained mode.

/mnt/mesos/sandbox/pi.conf A path to the downloaded file which your main class receives as a 0th parameter (see the code snippet below). /mnt/mesos/sandbox/ is a standard path inside a container which is mapped to a corespondent mesos-task sandbox.

public final class JavaSparkPiConf {

  public static void main(String[] args) throws Exception {
    SparkConf sparkConf = new SparkConf().setAppName("JavaSparkPi");
    JavaSparkContext jsc = new JavaSparkContext(sparkConf);

    Scanner scanner = new Scanner(new FileInputStream(args[0]));
    int slices;
    if (scanner.hasNextInt()) {
      slices = scanner.nextInt();
    } else {
      slices = 2;
    }
    int n = 100000 * slices;
    List<Integer> l = new ArrayList<>(n);
    for (int i = 0; i < n; i++) {
      l.add(i);
    }

    JavaRDD<Integer> dataSet = jsc.parallelize(l, slices);

    int count = dataSet.map(new Function<Integer, Integer>() {
      @Override
      public Integer call(Integer integer) {
        double x = Math.random() * 2 - 1;
        double y = Math.random() * 2 - 1;
        return (x * x + y * y < 1) ? 1 : 0;
      }
    }).reduce(new Function2<Integer, Integer, Integer>() {
      @Override
      public Integer call(Integer integer, Integer integer2) {
        return integer + integer2;
      }
    });

    System.out.println("Pi is roughly " + 4.0 * count / n);

    jsc.stop();
  }
}

Upvotes: 4

Michael Gummelt
Michael Gummelt

Reputation: 723

Streaming.conf is just a string that will be passed to your driver. Your driver must be able to see it. The easiest way to do this is to place it in an accessible location, the specify that you want it downloaded to the sandbox via spark.mesos.uris [1]. You could alternately write your application to support reading from a remote location, and just pass the location on the CLI.

--files is used to place files at the executors, but you're trying to pass a file to the driver, so that won't work.

[1] http://spark.apache.org/docs/latest/running-on-mesos.html

Michael Gummelt
Mesosphere

Upvotes: 2

Related Questions