Kuntal-G
Kuntal-G

Reputation: 2991

Log Data using Flume Avro not stored properly in Hive

Im using Flume 1.5.0 to collect log from Application server. Say i have three App server, App-A, App-B, App-C. One HDFS Server where hive is running. Now flume agents are running on all 3 App server and passing the log message from app servers to Hdfs server,where another flume agent is running and finaaly the logs are stored in hadoop file system. Now I have created an external Hive table to map those log data. But everything is working smoothly except the fact that hive is unable to parse the log data properly and store in table.

Here's my Flume and Hive configuration:

Dummy Log File Format (| separated): ClientId|App Request|URL

Flume conf at App servers:

app-agent.sources = tail
app-agent.channels = memoryChannel 
app-agent.sinks = avro-forward-sink 

app-agent.sources.tail.type = exec 
app-agent.sources.tail.command = tail -F /home/kuntal/practice/testing/application.log
app-agent.sources.tail.channels = memoryChannel


app-agent.channels.memoryChannel.type = memory
app-agent.channels.memoryChannel.capacity = 100000
app-agent.channels.memoryChannel.transactioncapacity = 10000

app-agent.sinks.avro-forward-sink.type = avro 
app-agent.sinks.avro-forward-sink.hostname = localhost
app-agent.sinks.avro-forward-sink.port = 10000
app-agent.sinks.avro-forward-sink.channel = memoryChannel

Flume conf at Hdfs server:

hdfs-agent.sources = avro-collect
hdfs-agent.channels = memoryChannel 
hdfs-agent.sinks = hdfs-write 

hdfs-agent.sources.avro-collect.type = avro 
hdfs-agent.sources.avro-collect.bind = localhost
hdfs-agent.sources.avro-collect.port = 10000 
hdfs-agent.sources.avro-collect.channels = memoryChannel

hdfs-agent.channels.memoryChannel.type = memory
hdfs-agent.channels.memoryChannel.capacity = 100000
hdfs-agent.channels.memoryChannel.transactioncapacity = 10000

hdfs-agent.sinks.hdfs-write.channel = memoryChannel
hdfs-agent.sinks.hdfs-write.type = hdfs 
hdfs-agent.sinks.hdfs-write.hdfs.path = hdfs://localhost:9000/user/flume/tail_table/avro
hdfs-agent.sinks.hdfs-write.rollInterval = 30 

Hive external table:

CREATE EXTERNAL TABLE IF NOT EXISTS test(clientId int, itemType string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'
LOCATION '/user/flume/tail_table/avro';

Please suggest me what to do? Do i need to include AvroSerde at hive side?

Upvotes: 0

Views: 973

Answers (2)

Kuntal-G
Kuntal-G

Reputation: 2991

Missing the following 3 additional settings in the hdfs sink :

hdfs-agent.sinks.hdfs-write.hdfs.fileType = DataStream
hdfs-agent.sinks.hdfs-write.hdfs.writeFormat = Text
hdfs-agent.sinks.hdfs-write.hdfs.rollInterval = 30 

Hence data was not properly stored in hdfs and Hive unable to load into table.Now its working fine!

Upvotes: 1

user1638758
user1638758

Reputation: 11

You have made a small typo on the flume config file on the hdfs server side.
For all hdfs related configurations we must include hdfs to the properties
So,

hdfs-agent.sinks.hdfs-write.rollInterval = 30

must be

hdfs-agent.sinks.hdfs-write.hdfs.rollInterval = 30

for more info refer to https://flume.apache.org/FlumeUserGuide.html#hdfs-sink.

Now check if the file got on the hdfs side is proper. Try to print the contents of the file by using cat command or so to see if only the text you wanted to send is there. If any other gibberish is still there when you print the contents then there is some mistake in the config file.

Upvotes: 1

Related Questions