Reputation: 910
I'm not able to locate error logs or message's from println
calls in Scala while running jobs on Spark
in EMR
.
Where can I access these?
I'm submitting the Spark job
, written in Scala
to EMR
using script-runner.jar
with arguments --deploy-mode
set to cluster
and --master
set to yarn
. It runs the job fine.
However I do not see my println
statements in the Amazon EMR UI
where it lists "stderr, stdoutetc. Furthermore if my job errors I don't see why it had an error. All I see is this in the
stderr`:
15/05/27 20:24:44 INFO yarn.Client: Application report from ResourceManager:
application identifier: application_1432754139536_0002
appId: 2
clientToAMToken: null
appDiagnostics:
appMasterHost: ip-10-185-87-217.ec2.internal
appQueue: default
appMasterRpcPort: 0
appStartTime: 1432758272973
yarnAppState: FINISHED
distributedFinalState: FAILED
appTrackingUrl: http://10.150.67.62:9046/proxy/application_1432754139536_0002/A
appUser: hadoop
`
Upvotes: 22
Views: 27595
Reputation: 364
As you are using yarn, it is very easy to get the logs using yarn logs command.
Example usage:
yarn logs -applicationId applicationId -am 1 | grep "Your app log"
This will print logs from 1st container which usually is master.
Upvotes: 1
Reputation: 478
I also spent a lot of time figuring this out. Found logs in the following location: EMR UI Console -> Summary -> Log URI -> Containers -> application_xxx_xxx -> container_yyy_yy_yy -> stdout.gz.
Upvotes: 7
Reputation: 1330
The event logs, the ones required for the spark-history-server
can be found at :
hdfs:///var/log/spark/apps
Upvotes: 1
Reputation: 2068
With the deploy mode of cluster on yarn the Spark driver and hence the user code executed will be within the Application Master container. It sounds like you had EMR debugging enabled on the cluster so logs should have also pushed to S3. In the S3 location look at task-attempts/<applicationid>/<firstcontainer>/*
.
Upvotes: 16
Reputation: 93
If you SSH into the master node of your cluster then you should be able to find the stdout, stderr, syslog and controller logs under:
/mnt/var/log/hadoop/steps/<stepname>
Upvotes: 6
Reputation: 7452
If you submit your job with emr-bootstrap you can specify the log directory as an s3 bucket with --log-uri
Upvotes: 0