Reputation: 842
As asked, i would like to keep a trace of sparks master logs to keep errors logs when they happend. I know that there are the workers logs on the webUI, but i'm not sure they show the same kind of error than the master.
I find that we have to modify the conf/log4j.properties but my tries doesn't work..
Default configuration + add file :
# Set everything to be logged to the console
log4j.rootCategory=INFO, console, file
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark-project.jetty=WARN
log4j.logger.org.spark-
project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR
# SPARK-9183: Settings to avoid annoying messages when looking up
nonexistent UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR
Try to setup the file
###Custom log file
log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.fileName=/var/data/log/MasterLogs/master.log
log4j.appender.file.ImmediateFlush=true
## Set the append to false, overwrite
log4j.appender.file.Append=false
log4j.appender.file.MaxFileSize=100MB
log4j.appender.file.MaxBackupIndex=10
##Define the layout for file appender
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
Upvotes: 6
Views: 22830
Reputation: 3692
You need to create 2 log4j.properties
files for driver and executor. And also path them in java options of driver and executor while submit your application using spark submit as below
spark-submit --class MAIN_CLASS --driver-java-options "-Dlog4j.configuration=file:PATH_OF_LOG4J_PROPERTIES_FOR_DRIVER" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:PATH_OF_LOG4J_PROPERTIES_FOR_EXECUTOR" --master MASTER_IP:PORT JAR_PATH
Here is an example of a log4j.properties
you might specify:
# Set everything to be logged to the console
log4j.rootCategory=INFO,FILE
log4j.appender.FILE=org.apache.log4j.FileAppender
log4j.appender.FILE.File={Enter path of the file}
log4j.appender.FILE.MaxFileSize=10MB
log4j.appender.FILE.MaxBackupIndex=10
log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
log4j.appender.FILE.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L – %m%n
# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark-project.jetty=WARN
log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
You can also check this blog for more details https://blog.knoldus.com/2016/02/23/logging-spark-application-on-standalone-cluster/
Upvotes: 9
Reputation: 1202
Follow this command.It will write output and console log into a file
hadoop@osboxes:~/spark-2.0.1-bin-hadoop2.7/bin$ ./spark-submit test.py > tempoutfile.txt 2>&1
Upvotes: 4