Reputation: 1023
I am using MapR5.2 - Spark version 2.1.0 And i am running my spark app jar in Yarn CLuster mode.
I have tried all the available options that i found But unable to succeed.
This is our Production environment. But i need that for my particular spark job it should follow and pick-up my log4j-Driver.properties file which is present in my src/main/resources folder(I also confirmed by opening the jar my property file is present)
1) Content of My Log File -> log4j-Driver.properties
log4j.rootCategory=DEBUG, FILE
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
log4j.appender.FILE=org.apache.log4j.RollingFileAppender
log4j.appender.FILE.File=/users/myuser/logs/Myapp.log
log4j.appender.FILE.ImmediateFlush=true
log4j.appender.FILE.Threshold=debug
log4j.appender.FILE.Append=true
log4j.appender.FILE.MaxFileSize=100MB
log4j.appender.FILE.MaxBackupIndex=10
log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
log4j.appender.FILE.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
# Set the default spark-shell log level to WARN. When running the spark-shell, the
# log level for this class is used to overwrite the root logger's log level, so that
# the user can have different defaults for the shell and regular Spark apps.
log4j.logger.org.apache.spark.repl.Main=WARN
# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark_project.jetty=WARN
log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR
# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR
2) My Script for Spark-Submit Command
propertyFile=application.properties
spark-submit --class MyAppDriver \
--conf "spark.driver.extraJavaOptions -Dlog4j.configuration=file:/users/myuser/log4j-Driver.properties" \
--master yarn --deploy-mode cluster \
--files /users/myuser/log4j-Driver.properties,/opt/mapr/spark/spark-2.1.0/conf/hive-site.xml,/users/myuser/application.properties
/users/myuser/myapp_2.11-1.0.jar $propertyFile
All i Need is as of now i am trying to Write my Driver Logs in the directory mentioned in my properties file(mentioned above) If i am successful in this then i will try for Executor logs as well. But first i need to make this Driver Log to write on my local (and its an Edge node of our Cluster)
Upvotes: 1
Views: 4789
Reputation: 6270
/users/myuser/log4j-Driver.properties
seems to be the path to the file on your local computer so you were right to use it for the --files
argument.
The problem is, that there's no such file on the driver and/or executor, so when you use file:/users/myuser/log4j-Driver.properties
as an argument to -Dlog4j.configuration
Log4j will fail to find it.
Since you run on YARN, files listed as arguments to --files
will be submitted to HDFS. Each application submission will have its own base directory in HDFS where all the files will be put by spark-submit
.
In order to refer to these files use relative paths. In your case --conf "spark.driver.extraJavaOptions -Dlog4j.configuration=log4j-Driver.properties"
should work.
Upvotes: 4