Reputation: 45
I can run the pi.py example on yarn successfully by: ./bin/spark-submit --master yarn --deploy-mode cluster examples/src/main/python/pi.py
.
But when I run ./bin/spark-submit --master yarn --deploy-mode cluster examples/src/main/python/ml/logistic_regression_with_elastic_net.py
, failed. Error message: Container exited with a non-zero exit code 1
By comparing these two file, I found after add this from pyspark.ml.classification import LogisticRegression
to pi.py
, run pi.py
failed too.
But I don't know how to fix it. And I have another quetion: When I run the ml example I have to upload my own data file to hdfs, So I try to use --files
, is it right? If it's right, Since I dont't know the path of the data file on hdfs, the python script can't get the data file. (I can see the path after the file is uploaded to hdfs, i.e. hdfs://master:9000/user/root/.sparkStaging/application_1488329960574_0011/mnist8m_800
but it's too late, can I specify the path when I submit?)
Upvotes: 0
Views: 306
Reputation: 13946
To run logistic_regression_with_elastic_net you need to upload sample libsvm data to HDFS like this:
$ hdfs dfs -mkdir -p data/mllib
$ hdfs dfs -put data/mllib/sample_libsvm_data.txt data/mllib
Then example will work with both yarn-client and yarn-cluster modes.
As far as I know files uploaded by --files
cannot be read using spark session (like in regression example).
Upvotes: 1