Jelly
Jelly

Reputation: 1310

Oozie action triggered by kafka messages

The task is to realize the following workflow:

  1. Kafka consumer read message from topic with file metadata.

  2. Copy file (specified in metadata) from filesystem (not HDFS) to another filesystem (not HDFS) and unpack it.

  3. Spark job read this file, handle and write into hdfs.

    Spark job is run by Oozie job.

How could I coordinate 1 and 2 flow phases (kafkaRead-fileRead) with 3 phase?

Nifi coming to mind.

But may be somethins I'm missing

Upvotes: 0

Views: 109

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 191914

NiFi or Spark Streaming alone should be used for this, as it is a streaming application. Spark Streaming are long-running applications, and not needed to be started by Oozie.

Or, you can use Kafka Streams for a simpler deployment pattern.

The only way you would use Oozie for this (since it is batch), is to use a scheduled action, and limit your Kafka consumer to only read N messages at a time, and commit any processed offsets.

You don't need to coordinate kafkaRead and fileRead since you would do something like readFile(consumerRecord.value().getFilePath()) while reading the Kafka data...

Also, you should not be using Spark to copy files. You should use a native Filesystem client for your "non HDFS" system. If you want to separate the "unpack" step from the "copy to HDFS" step, you can use a secondary Kafka topic to notify a new stream-processing application that "unpacked data at location is ready to be copied to destination"

Upvotes: 0

Related Questions