Reputation: 8923
I am trying to simulate syslog flume agent which eventually should put the data into HDFS.
My scenario follows:
the syslog flume agent is running on physical server A, following are the configuration details:
===
syslog_agent.sources = syslog_source
syslog_agent.channels = MemChannel
syslog_agent.sinks = HDFS
# Describing/Configuring the source
syslog_agent.sources.syslog_source.type = syslogudp
#syslog_agent.sources.syslog_source.bind = 0.0.0.0
syslog_agent.sources.syslog_source.bind = localhost
syslog_agent.sources.syslog_source.port = 514
# Describing/Configuring the sink
syslog_agent.sinks.HDFS.type=hdfs
syslog_agent.sinks.HDFS.hdfs.path=hdfs://<IP_ADD_OF_NN>:8020/user/ec2-user/syslog
syslog_agent.sinks.HDFS.hdfs.fileType=DataStream
syslog_agent.sinks.HDFS.hdfs.writeformat=Text
syslog_agent.sinks.HDFS.hdfs.batchSize=1000
syslog_agent.sinks.HDFS.hdfs.rollSize=0
syslog_agent.sinks.HDFS.hdfs.rollCount=10000
syslog_agent.sinks.HDFS.hdfs.rollInterval=600
# Describing/Configuring the channel
syslog_agent.channels.MemChannel.type=memory
syslog_agent.channels.MemChannel.capacity=10000
syslog_agent.channels.MemChannel.transactionCapacity=1000
#Bind sources and sinks to the channel
syslog_agent.sources.syslog_source.channels = MemChannel
syslog_agent.sinks.HDFS.channel = MemChannel
sudo logger --server <IP_Address_physical_server_A> --port 514 --udp
I do see yje log messages going into physical server-A 's path --> /var/log/messages
But I don't see any message going into HDFS; it seems the the flume agent isn't able to get any data, even though the messages are going from server-B to server-A.
Am I doing something wrong here? Can anyone help me how to resolve this?
EDIT
The following is the output of netstat command on server-A where the syslog daemon is running:
tcp 0 0 0.0.0.0:514 0.0.0.0:* LISTEN 573/rsyslogd
tcp6 0 0 :::514 :::* LISTEN 573/rsyslogd
udp 0 0 0.0.0.0:514 0.0.0.0:* 573/rsyslogd
udp6 0 0 :::514 :::* 573/rsyslogd
Upvotes: 0
Views: 216
Reputation: 191874
I'm not sure what logger --server
.gives you, but most examples I have seen use netcat.
In any case, you've set batchSize=1000
, so until you send 1000 messages, Flume will not write to HDFS.
Keep in mind, HDFS is not a streaming platform, and prefers not to have small files.
If you're looking for log collection, look into Elasticsearch or Solr fronted by a Kafka topic
Upvotes: 1