Reputation: 7107
I'm trying to transfer a 700 MB log file from flume
to HDFS
.
I have configured the flume
agent as follows:
...
tier1.channels.memory-channel.type = memory
...
tier1.sinks.hdfs-sink.channel = memory-channel
tier1.sinks.hdfs-sink.type = hdfs
tier1.sinks.hdfs-sink.path = hdfs://***
tier1.sinks.hdfs-sink.fileType = DataStream
tier1.sinks.hdfs-sink.rollSize = 0
The source is a spooldir
, channel is memory
and sink is hdfs
.
I have also tried to send a 1MB file, and flume split it to 1000 files, each one of size of 1KB. Another thing I have noticed is that the transfer was very slow, 1MB took about 1 minute. Am I doing something wrong?
Upvotes: 1
Views: 1435
Reputation: 2759
You need to disable the rolltimeout too, that's done with the following settings:
tier1.sinks.hdfs-sink.hdfs.rollCount = 0
tier1.sinks.hdfs-sink.hdfs.rollInterval = 300
rollcount prevents roll overs, rollIntervall here is set to 300 seconds, setting that to 0 will disable timeouts. You will have to chosse which mechanism you want for rollovers, otherwise Flume will only close the files upon shutdown.
The default values are the following:
hdfs.rollInterval 30 Number of seconds to wait before rolling current file (0 = never roll based on time interval)
hdfs.rollSize 1024 File size to trigger roll, in bytes (0: never roll based on file size)
hdfs.rollCount 10 Number of events written to file before it rolled (0 = never roll based on number of events)
Upvotes: 3