Roshan Fernando
Roshan Fernando

Reputation: 525

Logging Spark driver and executor logs on HDFS through Log4j

How to custom log executor logs in HDFS through Log4j? I tried, but the logs were not created in HDFS. Please confirm if it is possible through any means. Following is my log4j configuration.

(Note - However, we were able to view the custom logs as part of the executor logging in the Spark history server UI which pulls the executor logs from YARN which is stored in a default HDFS directory in a non-readable format, but it didn't make use of my custom logging directory or custom file what i had mentioned below)

LOG4J properties below:::

log4j.appender.myConsoleAppender=org.apache.log4j.ConsoleAppender
log4j.appender.myConsoleAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.myConsoleAppender.layout.ConversionPattern=%d [%t] %-5p %c - %m%n

log4j.appender.RollingAppender=org.apache.log4j.DailyRollingFileAppender
log4j.appender.RollingAppender.File=hdfs:///tmp/driverlogs/sparker-driver.log
log4j.appender.RollingAppender.DatePattern='.'yyyy-MM-dd
log4j.appender.RollingAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.RollingAppender.layout.ConversionPattern=[%p] %d %c %M - %m%n

log4j.appender.RollingAppenderU=org.apache.log4j.DailyRollingFileAppender
log4j.appender.RollingAppenderU.File=hdfs:///tmp/executorlogs/SparkUser.log
log4j.appender.RollingAppenderU.DatePattern='.'yyyy-MM-dd
log4j.appender.RollingAppenderU.layout=org.apache.log4j.PatternLayout
log4j.appender.RollingAppenderU.layout.ConversionPattern=[%p] %d %c %M - %m%n

log4j.rootLogger=DEBUG,RollingAppender,myConsoleAppender
log4j.logger.myLogger=INFO,RollingAppenderU

log4j.logger.spark.storage=INFO, RollingAppender
log4j.additivity.spark.storage=false
log4j.logger.spark.scheduler=INFO, RollingAppender
log4j.additivity.spark.scheduler=false
log4j.logger.spark.CacheTracker=INFO, RollingAppender
log4j.additivity.spark.CacheTracker=false
log4j.logger.spark.CacheTrackerActor=INFO, RollingAppender
log4j.additivity.spark.CacheTrackerActor=false
log4j.logger.spark.MapOutputTrackerActor=INFO, RollingAppender
log4j.additivity.spark.MapOutputTrackerActor=false
log4j.logger.spark.MapOutputTracker=INFO, RollingAppender
log4j.additivty.spark.MapOutputTracker=false

Scala - Spark program below

package com.wba.logtest.logtesting
import org.apache.log4j.{Level, LogManager}
import org.apache.spark._
import org.apache.spark.rdd.RDD

class Mapper(n: Int) extends Serializable{
  @transient lazy val log = org.apache.log4j.LogManager.getLogger("myLogger")
  def doSomeMappingOnDataSetAndLogIt(rdd: RDD[Int]): RDD[String] =
    rdd.map{ i =>
      log.info("mapping: " + i)
      (i + n).toString
    }
}
object Mapper {
  def apply(n: Int): Mapper = new Mapper(n)
}
object app {
  def main(args: Array[String]) {
    val log = LogManager.getRootLogger
    log.setLevel(Level.INFO)
    val conf = new SparkConf().setAppName("demo-app")
    val sc = new SparkContext(conf)
    log.info("Hello demo")
    val data = sc.parallelize(1 to 1000)
    val mapper = Mapper(1)
    val other = mapper.doSomeMappingOnDataSetAndLogIt(data)
    other.collect()
    log.info("I am done")
  }
}

`

Upvotes: 2

Views: 5802

Answers (1)

Harold
Harold

Reputation: 463

The (HDFS) YARN logs are in a readable format and you can get them from the command line yarn logs -applicationId .. passing your Spark application ID

Regarding the Spark driver logs, it depends upon the mode you've used to submit Spark job. In client mode, the logs are in your standard output. In cluster mode, the logs are associated to the YARN Application ID that triggers the job.

Otherwise a good alternative is to log messages through a log4j socket appender connected to Logstash/Elasticsearch.

Upvotes: 2

Related Questions