Reputation: 2991
Im using Flume 1.5.0 to collect log from Application server. Say i have three App server, App-A, App-B, App-C. One HDFS Server where hive is running. Now flume agents are running on all 3 App server and passing the log message from app servers to Hdfs server,where another flume agent is running and finaaly the logs are stored in hadoop file system. Now I have created an external Hive table to map those log data. But everything is working smoothly except the fact that hive is unable to parse the log data properly and store in table.
Here's my Flume and Hive configuration:
Dummy Log File Format (| separated): ClientId|App Request|URL
Flume conf at App servers:
app-agent.sources = tail
app-agent.channels = memoryChannel
app-agent.sinks = avro-forward-sink
app-agent.sources.tail.type = exec
app-agent.sources.tail.command = tail -F /home/kuntal/practice/testing/application.log
app-agent.sources.tail.channels = memoryChannel
app-agent.channels.memoryChannel.type = memory
app-agent.channels.memoryChannel.capacity = 100000
app-agent.channels.memoryChannel.transactioncapacity = 10000
app-agent.sinks.avro-forward-sink.type = avro
app-agent.sinks.avro-forward-sink.hostname = localhost
app-agent.sinks.avro-forward-sink.port = 10000
app-agent.sinks.avro-forward-sink.channel = memoryChannel
Flume conf at Hdfs server:
hdfs-agent.sources = avro-collect
hdfs-agent.channels = memoryChannel
hdfs-agent.sinks = hdfs-write
hdfs-agent.sources.avro-collect.type = avro
hdfs-agent.sources.avro-collect.bind = localhost
hdfs-agent.sources.avro-collect.port = 10000
hdfs-agent.sources.avro-collect.channels = memoryChannel
hdfs-agent.channels.memoryChannel.type = memory
hdfs-agent.channels.memoryChannel.capacity = 100000
hdfs-agent.channels.memoryChannel.transactioncapacity = 10000
hdfs-agent.sinks.hdfs-write.channel = memoryChannel
hdfs-agent.sinks.hdfs-write.type = hdfs
hdfs-agent.sinks.hdfs-write.hdfs.path = hdfs://localhost:9000/user/flume/tail_table/avro
hdfs-agent.sinks.hdfs-write.rollInterval = 30
Hive external table:
CREATE EXTERNAL TABLE IF NOT EXISTS test(clientId int, itemType string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'
LOCATION '/user/flume/tail_table/avro';
Please suggest me what to do? Do i need to include AvroSerde at hive side?
Upvotes: 0
Views: 973
Reputation: 2991
Missing the following 3 additional settings in the hdfs sink :
hdfs-agent.sinks.hdfs-write.hdfs.fileType = DataStream
hdfs-agent.sinks.hdfs-write.hdfs.writeFormat = Text
hdfs-agent.sinks.hdfs-write.hdfs.rollInterval = 30
Hence data was not properly stored in hdfs and Hive unable to load into table.Now its working fine!
Upvotes: 1
Reputation: 11
You have made a small typo on the flume config file on the hdfs server side.
For all hdfs related configurations we must include hdfs to the properties
So,
hdfs-agent.sinks.hdfs-write.rollInterval = 30
must be
hdfs-agent.sinks.hdfs-write.hdfs.rollInterval = 30
for more info refer to https://flume.apache.org/FlumeUserGuide.html#hdfs-sink.
Now check if the file got on the hdfs side is proper. Try to print the contents of the file by using cat command or so to see if only the text you wanted to send is there. If any other gibberish is still there when you print the contents then there is some mistake in the config file.
Upvotes: 1