Reputation: 9358
I'm creating a lead and event management system with Kafka. The problem is we are getting many fake leads (advertisement). We also have many consumer in our system. Is there anyway to filter advertisement before going to consumers? My solution is to write everything into the first topic, then read it by a filter consumer, then write it back to the second topic or filter it. But I'm not sure if it's efficient or not. Any idea?
Upvotes: 15
Views: 45899
Reputation: 4703
Take a look at Confluent's KSQL. (It's free and open source, https://www.confluent.io/product/ksql/.) It uses Kafka Streams under the hood, you can define your ksql queries and tables on the server side, the results of which are written to kafka topics, so you could just consume those topics, instead of writing code to create a intermediary filtering consumer. You'd only need to write the ksql table "ddl" or queries.
Upvotes: 0
Reputation: 223
You can use Kafka Streams (http://kafka.apache.org/documentation.html#streamsapi) with 0.10.+ version of Kafka. It's exactly for your use case i think.
Upvotes: 11
Reputation: 19
You can use Spark Streaming: https://spark.apache.org/docs/latest/streaming-kafka-integration.html.
Upvotes: 1
Reputation: 1863
Yes -- in fact I am mostly convinced that this is the way you're supposed to handle a problem in your context. Because Kafka is only useful for the efficient transmission of data, there is nothing it itself can do in terms of cleaning your data. Consume all the information you get by an intermediary consumer that can run its own tests to determine what passes its filter and push to a different topic / partition (based on your needs) to get the best data back.
Upvotes: 6