Run Spark official python machine learning example on Yarn failed

Question

I can run the pi.py example on yarn successfully by: ./bin/spark-submit --master yarn --deploy-mode cluster examples/src/main/python/pi.py.

But when I run ./bin/spark-submit --master yarn --deploy-mode cluster examples/src/main/python/ml/logistic_regression_with_elastic_net.py , failed. Error message: Container exited with a non-zero exit code 1

By comparing these two file, I found after add this from pyspark.ml.classification import LogisticRegression to pi.py, run pi.py failed too.

But I don't know how to fix it. And I have another quetion: When I run the ml example I have to upload my own data file to hdfs, So I try to use --files, is it right? If it's right, Since I dont't know the path of the data file on hdfs, the python script can't get the data file. (I can see the path after the file is uploaded to hdfs, i.e. hdfs://master:9000/user/root/.sparkStaging/application_1488329960574_0011/mnist8m_800but it's too late, can I specify the path when I submit?)

Mariusz · Accepted Answer

To run logistic_regression_with_elastic_net you need to upload sample libsvm data to HDFS like this:

$ hdfs dfs -mkdir -p data/mllib
$ hdfs dfs -put data/mllib/sample_libsvm_data.txt data/mllib

Then example will work with both yarn-client and yarn-cluster modes.

As far as I know files uploaded by --files cannot be read using spark session (like in regression example).

Run Spark official python machine learning example on Yarn failed

Answers (1)

Related Questions