Reputation: 159
I'm a beginner with Spark, Hadoop and Yarn. I install Spark with : https://spark.apache.org/docs/2.3.0/ and Hadoop/Yarn with : https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html. My aim is to run spark application on yarn cluster but I have problems. How do we know when our setup works ? I will show you my example. After doing my setup, I tried to run the test jar : examples/jars/spark-examples*.jar. When I run locally spark with : ./bin/spark-submit --class org.apache.spark.examples.SparkPi , I see at one moment the line : "Pi is roughly 3.1370956854784273", whereas when I want to run on a yarn cluster with : ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster examples/jars/spark-examples*.jar I don't see "Pi is roughly 3.1370956854784273" in the console and I don't know where I can find this. I watch the log in the Url http://localhost:8088/cluster/cluster but it doesn't appear. Do you know where I should look ? Thanks for your help and have a nice day.
Upvotes: 5
Views: 10838
Reputation: 498
In yarn cluster mode, the default output console is not really your driver (where you submit your job) but the yarn logs it self. So you can run
yarn logs -applicationId application_1549879021111_0007 >application_1549879021111_0007.log
and after
more application_1549879021111_0007.log
Then you can use /pattern
where pattern is a word or expression that you have in your print command in inside your python script. Usually, I use
print ('####' + expression to print + '###')
After I can do /### to find my print
Upvotes: 2
Reputation: 91
I encountered with the same issue and finally able to check the "Pi is roughly 3.14..." after the following steps:
First enable yarn log aggregation in every nodes by adding these lines to yarn-site.xml
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds</name>
<value>3600</value>
</property>
You may need to restart yarn and dfs after modification of yarn-site.xml
Then check the logs by command line:
yarn logs -applicationId <applicationID>
Hope it helps.
Upvotes: 1
Reputation: 191728
You need to find the Spark driver container in YARN, or from the Spark UI. From there, you can go to the Executors tab, and you will see the stdout
and stderr
links for each one (plus, the Driver, where the final output will be).
Overtime, YARN will evict these logs, which is why you would need log aggregation enabled and the Spark History Server deployed.
FWIW, Cloudera is going all-in on running Spark on Kubernetes in recent announcements. Not sure what that says for having YARN (or HDFS with Ceph or S3 being popular datastores with these deployments)
Upvotes: 0
Reputation: 257
You can use view the same using resource manager and the application id
or by using the following command you will get the entire log for the application
using
yarn logs -applicationId application ID
Upvotes: 3
Reputation: 168
You will have to write the console output to a file, what this will do is that it will write the output of your spark program being executed into a file, you can use tail -f 100 on the consoleoutfile.txt mentioned below to see your console output.
./submit_command > local_fs_path/consoleoutfile.txt 2>&1
Upvotes: -1