Tymek
Tymek

Reputation: 37

Flume not enough space error while data flow from Kafka to HDFS

We are struggling with data flow from Kafka to HDFS managing by Flume. Data is not fully transported to hdfs, becouse of exceptions described below. However this error looks misleading for us, we have enough of space both in data directory and in hdfs. We are thinking it might be the problem with channels configuration, but we have similar configuration for other sources and it works correctly for them. If someone had to deal with this problem I would be grateful for tips.

17 Aug 2017 14:15:24,335 ERROR [Log-BackgroundWorker-channel1] (org.apache.flume.channel.file.Log$BackgroundWorker.run:1204)  - Error doing checkpoint
java.io.IOException: Usable space exhausted, only 0 bytes remaining, required 524288000 bytes
        at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:1003)
        at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:986)
        at org.apache.flume.channel.file.Log.access$200(Log.java:75)
        at org.apache.flume.channel.file.Log$BackgroundWorker.run(Log.java:1201)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
17 Aug 2017 14:15:27,552 ERROR [PollableSourceRunner-KafkaSource-kafkaSource] (org.apache.flume.source.kafka.KafkaSource.doProcess:305)  - KafkaSource EXCEPTION, {}
org.apache.flume.ChannelException: Commit failed due to IO error [channel=channel1]
        at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:639)
        at org.apache.flume.channel.BasicTransactionSemantics.rollback(BasicTransactionSemantics.java:168)
        at org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:194)
        at org.apache.flume.source.kafka.KafkaSource.doProcess(KafkaSource.java:286)
        at org.apache.flume.source.AbstractPollableSource.process(AbstractPollableSource.java:58)
        at org.apache.flume.source.PollableSourceRunner$PollingRunner.run(PollableSourceRunner.java:137)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Usable space exhausted, only 0 bytes remaining, required 524288026 bytes
        at org.apache.flume.channel.file.Log.rollback(Log.java:722)
        at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:637)
        ... 6 more

Flume configuration:

agent2.sources = kafkaSource

#sources defined
agent2.sources.kafkaSource.type = org.apache.flume.source.kafka.KafkaSource
agent2.sources.kafkaSource.kafka.bootstrap.servers = …
agent2.sources.kafkaSource.kafka.topics = pega-campaign-response
agent2.sources.kafkaSource.channels = channel1

# channels defined
agent2.channels = channel1

agent2.channels.channel1.type = file
agent2.channels.channel1.checkpointDir = /data/cloudera/.flume/filechannel/checkpointdirs/pega
agent2.channels.channel1.dataDirs = /data/cloudera/.flume/filechannel/datadirs/pega
agent2.channels.channel1.capacity = 10000
agent2.channels.channel1.transactionCapacity = 10000

#hdfs sinks

agent2.sinks = sink

agent2.sinks.sink.type = hdfs
agent2.sinks.sink.hdfs.fileType = DataStream
agent2.sinks.sink.hdfs.path = hdfs://bigdata-cls:8020/stage/data/pega/campaign-response/%d%m%Y
agent2.sinks.sink.hdfs.batchSize = 1000
agent2.sinks.sink.hdfs.rollCount = 0
agent2.sinks.sink.hdfs.rollSize = 0
agent2.sinks.sink.hdfs.rollInterval = 120
agent2.sinks.sink.hdfs.useLocalTimeStamp = true
agent2.sinks.sink.hdfs.filePrefix = pega-

df -h command:

Filesystem             Size  Used Avail Use% Mounted on
/dev/mapper/rhel-root   26G  6.8G   18G  28% /
devtmpfs               126G     0  126G   0% /dev
tmpfs                  126G  6.3M  126G   1% /dev/shm
tmpfs                  126G  2.9G  123G   3% /run
tmpfs                  126G     0  126G   0% /sys/fs/cgroup
/dev/sda1              477M  133M  315M  30% /boot
tmpfs                   26G     0   26G   0% /run/user/0
cm_processes           126G  1.9G  124G   2% /run/cloudera-scm-agent/process
/dev/scinib            2.0T   53G  1.9T   3% /data
tmpfs                   26G   20K   26G   1% /run/user/2000

Upvotes: 0

Views: 821

Answers (2)

srk
srk

Reputation: 91

Change the channel type to memory-channel and test it to isolate the disk space problems. agent2.channels.channel1.type = memory

Also, since you already have kafka in your setup, use it as flume channel.

https://flume.apache.org/FlumeUserGuide.html#kafka-channel

Upvotes: 1

hlagos
hlagos

Reputation: 7957

your error doesn't point to the free space in the hdfs, it is the freespace in the local disk for the file used in your channel. If you see here file-channel you will see that the default values is 524288000. Check that the free local space is enough (it seems to be 0 based on your error). Also you can change the property minimumRequiredSpace.

Upvotes: 0

Related Questions