Reputation: 525
How to custom log executor logs in HDFS through Log4j? I tried, but the logs were not created in HDFS. Please confirm if it is possible through any means. Following is my log4j configuration.
(Note - However, we were able to view the custom logs as part of the executor logging in the Spark history server UI which pulls the executor logs from YARN which is stored in a default HDFS directory in a non-readable format, but it didn't make use of my custom logging directory or custom file what i had mentioned below)
LOG4J properties below:::
log4j.appender.myConsoleAppender=org.apache.log4j.ConsoleAppender
log4j.appender.myConsoleAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.myConsoleAppender.layout.ConversionPattern=%d [%t] %-5p %c - %m%n
log4j.appender.RollingAppender=org.apache.log4j.DailyRollingFileAppender
log4j.appender.RollingAppender.File=hdfs:///tmp/driverlogs/sparker-driver.log
log4j.appender.RollingAppender.DatePattern='.'yyyy-MM-dd
log4j.appender.RollingAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.RollingAppender.layout.ConversionPattern=[%p] %d %c %M - %m%n
log4j.appender.RollingAppenderU=org.apache.log4j.DailyRollingFileAppender
log4j.appender.RollingAppenderU.File=hdfs:///tmp/executorlogs/SparkUser.log
log4j.appender.RollingAppenderU.DatePattern='.'yyyy-MM-dd
log4j.appender.RollingAppenderU.layout=org.apache.log4j.PatternLayout
log4j.appender.RollingAppenderU.layout.ConversionPattern=[%p] %d %c %M - %m%n
log4j.rootLogger=DEBUG,RollingAppender,myConsoleAppender
log4j.logger.myLogger=INFO,RollingAppenderU
log4j.logger.spark.storage=INFO, RollingAppender
log4j.additivity.spark.storage=false
log4j.logger.spark.scheduler=INFO, RollingAppender
log4j.additivity.spark.scheduler=false
log4j.logger.spark.CacheTracker=INFO, RollingAppender
log4j.additivity.spark.CacheTracker=false
log4j.logger.spark.CacheTrackerActor=INFO, RollingAppender
log4j.additivity.spark.CacheTrackerActor=false
log4j.logger.spark.MapOutputTrackerActor=INFO, RollingAppender
log4j.additivity.spark.MapOutputTrackerActor=false
log4j.logger.spark.MapOutputTracker=INFO, RollingAppender
log4j.additivty.spark.MapOutputTracker=false
Scala - Spark program below
package com.wba.logtest.logtesting
import org.apache.log4j.{Level, LogManager}
import org.apache.spark._
import org.apache.spark.rdd.RDD
class Mapper(n: Int) extends Serializable{
@transient lazy val log = org.apache.log4j.LogManager.getLogger("myLogger")
def doSomeMappingOnDataSetAndLogIt(rdd: RDD[Int]): RDD[String] =
rdd.map{ i =>
log.info("mapping: " + i)
(i + n).toString
}
}
object Mapper {
def apply(n: Int): Mapper = new Mapper(n)
}
object app {
def main(args: Array[String]) {
val log = LogManager.getRootLogger
log.setLevel(Level.INFO)
val conf = new SparkConf().setAppName("demo-app")
val sc = new SparkContext(conf)
log.info("Hello demo")
val data = sc.parallelize(1 to 1000)
val mapper = Mapper(1)
val other = mapper.doSomeMappingOnDataSetAndLogIt(data)
other.collect()
log.info("I am done")
}
}
`
Upvotes: 2
Views: 5802
Reputation: 463
The (HDFS) YARN logs are in a readable format and you can get them from the command line yarn logs -applicationId ..
passing your Spark application ID
Regarding the Spark driver logs, it depends upon the mode you've used to submit Spark job. In client mode, the logs are in your standard output. In cluster mode, the logs are associated to the YARN Application ID that triggers the job.
Otherwise a good alternative is to log messages through a log4j socket appender connected to Logstash/Elasticsearch.
Upvotes: 2