"Size in Memory" under storage tab of spark UI showing increase in RAM usage over time for spark streaming

Question

I am using spark streaming in my application. Data comes in the form of streaming files every 15 minute. I have allocated 10G of RAM to spark executors. With this setting my spark application is running fine. But by looking the spark UI, under Storage tab -> Size in Memory the usage of RAM keep on increasing over the time. When I started streaming job, "Size in Memory" usage was in KB. Today it has been 2 weeks 2 days 22 hours since when I started the streaming job and usage has increased to 858.4 MB. Also I have noticed on more thing, under Streaming heading:

When I started the job, Processing Time and Total Delay (from the image) was 5 second and which after 16 days, increased to 19-23 seconds while the streaming file size is almost same. Before increasing the executor memory to 10G, spark jobs keeps on failing almost every 5 days (with default executor memory which is 1GB). With increase of executor memory to 10G, it is running continuously from more than 16 days.

I am worried about the memory issues. If "Size in Memory" values keep on increasing like this, then sooner or later I will run out of RAM and spark job will get fail again with 10G of executor memory as well. What I can do to avoid this? Do I need to do some configuration?

Just to give the context of my spark application, I have enable following properties in spark context:

SparkConf sparkConf = new SparkConf().setMaster(sparkMaster).                               .set("spark.streaming.receiver.writeAheadLog.enable", "true")
        .set("spark.streaming.minRememberDuration", 1440);

And also, I have enable checkpointing like following:

sc.checkpoint(hadoop_directory)

I want to highlight one more thing is that I was having issue while enabling checkpointing. Regarding checkpointing issue, I have already posted a question on following link: Spark checkpoining error when joining static dataset with DStream

I was not able to set the checkpointing the way I wanted, so did it differently (highlighted above) and it is working fine now. I am not asking checkpointing question again, however I mentioned it so that it might help you to understand if current memory issue somehow related to previous one (checkpointing).

Environment detail: Spark 1.4.1 with two node cluster of centos 7 machines. Hadoop 2.7.1.

"Size in Memory" under storage tab of spark UI showing increase in RAM usage over time for spark streaming

Answers (1)

Related Questions

&quot;Size in Memory&quot; under storage tab of spark UI showing increase in RAM usage over time for spark streaming

Answers (1)

Related Questions

"Size in Memory" under storage tab of spark UI showing increase in RAM usage over time for spark streaming