Reputation: 1193
I am running my Spark application on EMR, and have several println() statements. Other than the console, where do these statements get logged?
My S3 aws-logs directory structure for my cluster looks like:
node
├── i-0031cd7a536a42g1e
│ ├── applications
│ ├── bootstrap-actions
│ ├── daemons
│ ├── provision-node
│ └── setup-devices
containers/
├── application_12341331455631_0001
│ ├── container_12341331455631_0001_01_000001
Upvotes: 10
Views: 10389
Reputation: 1963
You can find println's in a few places:
containers/application_.../container_.../stdout
(though this takes a few minutes to populate after the application)yarn logs -applicationId <Application ID> -log_files <log_file_type>
Upvotes: 14
Reputation: 1961
There is a very important thing that you need to consider when printing from Spark: are you running code that gets executed in the driver or is it code that runs in the executor?
For example, if you do the following, it will output in the console as you are bringing data back to the driver:
for i in your_rdd.collect():
print i
But the following will run within an executor and thus it will be written in the Spark logs:
def run_in_executor(value):
print value
your_rdd.map(lambda x: value(x))
Now going to your original question, the second case will write to the log location. Logs are usually written to the master node which is located in /mnt/var/log/hadoop/steps, but it might be better to configure logs to an s3 bucket with --log-uri. That way it will be easier to find.
Upvotes: 3