Reputation: 37
We are struggling with data flow from Kafka to HDFS managing by Flume. Data is not fully transported to hdfs, becouse of exceptions described below. However this error looks misleading for us, we have enough of space both in data directory and in hdfs. We are thinking it might be the problem with channels configuration, but we have similar configuration for other sources and it works correctly for them. If someone had to deal with this problem I would be grateful for tips.
17 Aug 2017 14:15:24,335 ERROR [Log-BackgroundWorker-channel1] (org.apache.flume.channel.file.Log$BackgroundWorker.run:1204) - Error doing checkpoint
java.io.IOException: Usable space exhausted, only 0 bytes remaining, required 524288000 bytes
at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:1003)
at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:986)
at org.apache.flume.channel.file.Log.access$200(Log.java:75)
at org.apache.flume.channel.file.Log$BackgroundWorker.run(Log.java:1201)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
17 Aug 2017 14:15:27,552 ERROR [PollableSourceRunner-KafkaSource-kafkaSource] (org.apache.flume.source.kafka.KafkaSource.doProcess:305) - KafkaSource EXCEPTION, {}
org.apache.flume.ChannelException: Commit failed due to IO error [channel=channel1]
at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:639)
at org.apache.flume.channel.BasicTransactionSemantics.rollback(BasicTransactionSemantics.java:168)
at org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:194)
at org.apache.flume.source.kafka.KafkaSource.doProcess(KafkaSource.java:286)
at org.apache.flume.source.AbstractPollableSource.process(AbstractPollableSource.java:58)
at org.apache.flume.source.PollableSourceRunner$PollingRunner.run(PollableSourceRunner.java:137)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Usable space exhausted, only 0 bytes remaining, required 524288026 bytes
at org.apache.flume.channel.file.Log.rollback(Log.java:722)
at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:637)
... 6 more
Flume configuration:
agent2.sources = kafkaSource
#sources defined
agent2.sources.kafkaSource.type = org.apache.flume.source.kafka.KafkaSource
agent2.sources.kafkaSource.kafka.bootstrap.servers = …
agent2.sources.kafkaSource.kafka.topics = pega-campaign-response
agent2.sources.kafkaSource.channels = channel1
# channels defined
agent2.channels = channel1
agent2.channels.channel1.type = file
agent2.channels.channel1.checkpointDir = /data/cloudera/.flume/filechannel/checkpointdirs/pega
agent2.channels.channel1.dataDirs = /data/cloudera/.flume/filechannel/datadirs/pega
agent2.channels.channel1.capacity = 10000
agent2.channels.channel1.transactionCapacity = 10000
#hdfs sinks
agent2.sinks = sink
agent2.sinks.sink.type = hdfs
agent2.sinks.sink.hdfs.fileType = DataStream
agent2.sinks.sink.hdfs.path = hdfs://bigdata-cls:8020/stage/data/pega/campaign-response/%d%m%Y
agent2.sinks.sink.hdfs.batchSize = 1000
agent2.sinks.sink.hdfs.rollCount = 0
agent2.sinks.sink.hdfs.rollSize = 0
agent2.sinks.sink.hdfs.rollInterval = 120
agent2.sinks.sink.hdfs.useLocalTimeStamp = true
agent2.sinks.sink.hdfs.filePrefix = pega-
df -h command:
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/rhel-root 26G 6.8G 18G 28% /
devtmpfs 126G 0 126G 0% /dev
tmpfs 126G 6.3M 126G 1% /dev/shm
tmpfs 126G 2.9G 123G 3% /run
tmpfs 126G 0 126G 0% /sys/fs/cgroup
/dev/sda1 477M 133M 315M 30% /boot
tmpfs 26G 0 26G 0% /run/user/0
cm_processes 126G 1.9G 124G 2% /run/cloudera-scm-agent/process
/dev/scinib 2.0T 53G 1.9T 3% /data
tmpfs 26G 20K 26G 1% /run/user/2000
Upvotes: 0
Views: 821
Reputation: 91
Change the channel type to memory-channel and test it to isolate the disk space problems. agent2.channels.channel1.type = memory
Also, since you already have kafka in your setup, use it as flume channel.
https://flume.apache.org/FlumeUserGuide.html#kafka-channel
Upvotes: 1
Reputation: 7957
your error doesn't point to the free space in the hdfs, it is the freespace in the local disk for the file used in your channel. If you see here file-channel you will see that the default values is 524288000. Check that the free local space is enough (it seems to be 0 based on your error). Also you can change the property minimumRequiredSpace.
Upvotes: 0