Flume not enough space error while data flow from Kafka to HDFS

Question

We are struggling with data flow from Kafka to HDFS managing by Flume. Data is not fully transported to hdfs, becouse of exceptions described below. However this error looks misleading for us, we have enough of space both in data directory and in hdfs. We are thinking it might be the problem with channels configuration, but we have similar configuration for other sources and it works correctly for them. If someone had to deal with this problem I would be grateful for tips.

17 Aug 2017 14:15:24,335 ERROR [Log-BackgroundWorker-channel1] (org.apache.flume.channel.file.Log$BackgroundWorker.run:1204)  - Error doing checkpoint
java.io.IOException: Usable space exhausted, only 0 bytes remaining, required 524288000 bytes
        at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:1003)
        at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:986)
        at org.apache.flume.channel.file.Log.access$200(Log.java:75)
        at org.apache.flume.channel.file.Log$BackgroundWorker.run(Log.java:1201)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
17 Aug 2017 14:15:27,552 ERROR [PollableSourceRunner-KafkaSource-kafkaSource] (org.apache.flume.source.kafka.KafkaSource.doProcess:305)  - KafkaSource EXCEPTION, {}
org.apache.flume.ChannelException: Commit failed due to IO error [channel=channel1]
        at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:639)
        at org.apache.flume.channel.BasicTransactionSemantics.rollback(BasicTransactionSemantics.java:168)
        at org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:194)
        at org.apache.flume.source.kafka.KafkaSource.doProcess(KafkaSource.java:286)
        at org.apache.flume.source.AbstractPollableSource.process(AbstractPollableSource.java:58)
        at org.apache.flume.source.PollableSourceRunner$PollingRunner.run(PollableSourceRunner.java:137)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Usable space exhausted, only 0 bytes remaining, required 524288026 bytes
        at org.apache.flume.channel.file.Log.rollback(Log.java:722)
        at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:637)
        ... 6 more

Flume configuration:

agent2.sources = kafkaSource

#sources defined
agent2.sources.kafkaSource.type = org.apache.flume.source.kafka.KafkaSource
agent2.sources.kafkaSource.kafka.bootstrap.servers = …
agent2.sources.kafkaSource.kafka.topics = pega-campaign-response
agent2.sources.kafkaSource.channels = channel1

# channels defined
agent2.channels = channel1

agent2.channels.channel1.type = file
agent2.channels.channel1.checkpointDir = /data/cloudera/.flume/filechannel/checkpointdirs/pega
agent2.channels.channel1.dataDirs = /data/cloudera/.flume/filechannel/datadirs/pega
agent2.channels.channel1.capacity = 10000
agent2.channels.channel1.transactionCapacity = 10000

#hdfs sinks

agent2.sinks = sink

agent2.sinks.sink.type = hdfs
agent2.sinks.sink.hdfs.fileType = DataStream
agent2.sinks.sink.hdfs.path = hdfs://bigdata-cls:8020/stage/data/pega/campaign-response/%d%m%Y
agent2.sinks.sink.hdfs.batchSize = 1000
agent2.sinks.sink.hdfs.rollCount = 0
agent2.sinks.sink.hdfs.rollSize = 0
agent2.sinks.sink.hdfs.rollInterval = 120
agent2.sinks.sink.hdfs.useLocalTimeStamp = true
agent2.sinks.sink.hdfs.filePrefix = pega-

df -h command:

Filesystem             Size  Used Avail Use% Mounted on
/dev/mapper/rhel-root   26G  6.8G   18G  28% /
devtmpfs               126G     0  126G   0% /dev
tmpfs                  126G  6.3M  126G   1% /dev/shm
tmpfs                  126G  2.9G  123G   3% /run
tmpfs                  126G     0  126G   0% /sys/fs/cgroup
/dev/sda1              477M  133M  315M  30% /boot
tmpfs                   26G     0   26G   0% /run/user/0
cm_processes           126G  1.9G  124G   2% /run/cloudera-scm-agent/process
/dev/scinib            2.0T   53G  1.9T   3% /data
tmpfs                   26G   20K   26G   1% /run/user/2000

srk · Accepted Answer

Change the channel type to memory-channel and test it to isolate the disk space problems. agent2.channels.channel1.type = memory

Also, since you already have kafka in your setup, use it as flume channel.

https://flume.apache.org/FlumeUserGuide.html#kafka-channel

Flume not enough space error while data flow from Kafka to HDFS

Answers (2)

Related Questions