Reputation: 145
I am developing a streaming app with Apache Spark. The app receives sensor data by subscribing to a Kafka topic named sensor
. The purpose of the app is to filter the sensor data, transform it and publish it back to a different Kafka topic named people
for other consumers. The messages in topic people
must have the same order as they arrived in topic sensor
. Thus, I am currently using only one partition in Kafka.
Here's my code:
val myStream = KafkaUtils.createDirectStream[K, V](streamingContext, PreferConsistent, Subscribe[K, V](topics, consumerConfig))
def process(record: (RDD[ConsumerRecord[String, String]], Time)): Unit = record match {
case (rdd, time) if !rdd.isEmpty =>
// More Code...
// Filter RDD, transform to JSON, build Seq[People]...
// In the end, I have: Dataset[People]
// Publish to Kafka topic 'people'
case _ =>
}
myStream.foreachRDD((x, y) => process((x, y)))
Today, I asked a question on how to achieve the correct ordering in Spark, after transforming it into my People
data structure.
The answer indicated that using Spark with a single partition is not wise and that this actually might be a design flaw:
Unless you have a single partition (and then you wouldn't use Spark, would you?) the order...
I am now wondering whether I can improve the overall design of my application (change the map-reduce flow) or if Spark is not a good fit for my use case.
Upvotes: 2
Views: 96
Reputation: 7386
In your case Kafka is not the right choice. Kafka only maintains the total order over the messages within a partition. The parallelism or scalability of Kafka is purely depended on the no: of partitions on a particular topic. The flaw is completely with the design.
If you really want to preserve the order you can have a epoch timestamp in your data and once you transform the data you can sort the data and store it.
Upvotes: 0
Reputation: 11
While this is primarily opinion based you are using tools which are designed for:
to solve problem defined as:
where:
would be perfectly sufficient.
So subjectively speaking there is a serious design flaw here.
Upvotes: 1