vijay
vijay

Reputation: 1233

Spark streaming job log size overflow

I have spark streaming (2.1) job running in cluster mode and keep running into an issue where the job gets killed (by resource manager) after few weeks because the yarn container logs are causing the disk to get filled. Is there a way to avoid this?

I currently set the below two settings for log size. However this is not helping with the above situation.

spark.executor.logs.rolling.maxRetainedFiles 2 spark.executor.logs.rolling.maxSize 107374182

Thanks!

Upvotes: 3

Views: 2226

Answers (3)

Thomas
Thomas

Reputation: 11

You forgot this property:

spark.executor.logs.rolling.strategy size

Upvotes: 1

Ravikumar
Ravikumar

Reputation: 1131

The best approach is create new log4j properties for spark streaming jobs and instead of console Appender use File appender to roll-up the file size, number of files. You can create /etc/spark/conf/spark-stream-log4j.properties as like following

log4j.rootCategory=INFO, filerolling

log4j.appender.filerolling=org.apache.log4j.filerollingFileAppender
log4j.appender.filerolling.layout=org.apache.log4j.PatternLayout
log4j.appender.filerolling.layout.conversionPattern=[%d] %p %m (%c)%n
log4j.appender.filerolling.maxFileSize=3MB
log4j.appender.filerolling.maxBackupIndex=15
log4j.appender.filerolling.file=/var/log/hadoop-yarn/containers/spark.log

log4j.appender.filerolling.encoding=UTF-8

## To minimize the logs
log4j.logger.org.apache.spark=ERROR
log4j.logger.com.datastax=ERROR
log4j.logger.org.apache.hadoop=ERROR
log4j.logger.hive=ERROR
log4j.logger.org.apache.hadoop.hive=ERROR
log4j.logger.org.spark_project.jetty.server.HttpChannel=ERROR
log4j.logger.org.spark_project.jetty.servlet.ServletHandler=ERROR
log4j.org.apache.kafka=INFO

Spark submit command like

spark-submit  --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-stream-log4j.properties  -XX:+UseConcMarkSweepGC -XX:OnOutOfMemoryError='kill -9 %p'"   --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-stream-log4j.properties -XX:+UseConcMarkSweepGC  -XX:OnOutOfMemoryError='kill -9 %p'"  --files /etc/spark/conf/spark-stream-log4j.properties 

Upvotes: 3

wandermonk
wandermonk

Reputation: 7376

Spark generates lots of INFO logs. So you can add the below lines to avoid logging unnecessary INFO logs

import org.apache.log4j.Level;
import org.apache.log4j.Logger;

Logger.getLogger("org").setLevel(Level.OFF);

Upvotes: 0

Related Questions