Manikandan Kannan
Manikandan Kannan

Reputation: 9024

Replace LogStash with Spark Streaming

My requirement is to read log data from multiple machines. LogStash - As far as i understand, LogStash agents to be installed on all the machines and LogStash can push data to Kafka as and when it arrives i.e. even if a new line is added to a file, LogStash reads only that not the entire file again.

Questions

  1. Now i it possible to achieve the same with Spark Streaming?

  2. If So, whats the advantage\disadvantage of using Spark Streaming over LogStash?

Upvotes: 0

Views: 795

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 191884

LogStash agents to be installed on all the machines

Yes, you need some agent on all machines. The solution in the ELK stack is actually FileBeat, not Logstash agents. Logstash is more of a server/message-bus in this scenario.

Similarly, some Spark job would need running to read a file. Personally, I would have anything else tail-ing a log file (even literally just tail -f file.log piping out a network socket). Needing to write and distribute a Spark JAR + config files is a clear disadvantage. Especially when you need to have Java installed on each of those machines you are collecting logs on.

Flume or Fluentd are other widely used options for distributed log collection with Kafka destinations

LogStash can push data to Kafka

The Beats framework has a Kafka Output, but you can also ship to Logstash first.

It's not clear if you are using LogStash purely for Kafka, or also using ElasticSearch here, but Kafka Connect provides a file-source (and Elasticsearch output).

reads only that not the entire file again

Whatever tool you use (including Spark Streaming's File source) will typically be watching directories of files (because if you aren't rotating log files, you're doing it wrong). As files come in, or bytes written to a file, that framework will need to commit some type of marker internally to indicate what elements have been consumed so far. To reset the agent, this metadata should be able to be removed/reset to start from the beginning

Upvotes: 1

Related Questions