nmr
nmr

Reputation: 763

Console output from worker nodes to a file in Spark cluster mode

I am running a pyspark script using spark-submit. The job runs succesfully.

Now I am trying to collect console output of this job to a file like below.

spark-submit in yarn-client mode

spark-submit --master yarn-client --num-executors 5 --executor-cores 5 --driver-memory 5G --executor-memory 10G --files /usr/hdp/current/spark-client/conf/hive-site.xml --jars /usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar --py-files customer_profile/customer_helper.py#customer_helper.py,customer_profile/customer_json.json customer_profile/customer.py  > /home/$USER/logs/customer_2018_10_26 2>&1

I am able to redirect all the console output written to the file /home/$USER/logs/customer_2018_10_26 includes all loglevels and any stacktrace errors

spark-submit in yarn-cluster mode

spark-submit --master yarn-cluster --num-executors 5 --executor-cores 5 --driver-memory 5G --executor-memory 10G --files /usr/hdp/current/spark-client/conf/hive-site.xml --jars /usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar --py-files customer_profile/customer_helper.py#customer_helper.py,customer_profile/customer_json.json customer_profile/customer.py  > /home/$USER/logs/customer_2018_10_26 2>&1

USing yarn-cluster mode I am unable to redirect console output written to the file /home/$USER/logs/customer_2018_10_26.

The problem is if my job fails in yarn-client mode I can go to file /home/$USER/logs/customer_2018_10_26 and easily look for the errors.

But if my job fails in yarn-cluster mode then I am not getting the stack trace to be copied to the file /home/$USER/logs/customer_2018_10_26. The only way I can debug the error is using yarn logs.

I would like to avoid using the yarn logs option Instead I want to see the error stack trace in the file /home/$USER/logs/customer_2018_10_26 itself while using yarn-cluster mode.

How can I achieve that?

Upvotes: 0

Views: 1804

Answers (0)

Related Questions