Igor
Igor

Reputation: 1464

How do I read from a Kafka Table filtering by a specific field?

I have a kafka table, let's name it "MY_TABLE" and there are a structure... something like this:

{
    "ROWTIME":123456,
    "ROWKEY":"3_1234_all",
    "id":1,
    "provider_id":3,
    "person_id":"1234"
}

In this kafka table I have a lot of different data with different provider_id. I need to retrieve all different person_ids from this kafka table, where provider=3.

I'm new to kafka and found this approach here: https://kafka-tutorials.confluent.io/filter-a-stream-of-events/kstreams.html#consume-filtered-events-from-the-output-topic

But I'm not sure if I actually need a new topic just to have the filtered data that I'll be using inside the application. I would need to read this results once every few hours in order to create a query filtering by person_ids.

It's a springboot application btw, so I'll be reading it on java.

Upvotes: 0

Views: 1076

Answers (1)

user152468
user152468

Reputation: 3242

Not sure what you mean with "Kafka Table". Kafka only knows topics, KSQL and Kafka Streams know tables.

Assuming you meant "Kafka topic", the link you are providing already is a good starting point.

Yet you do not need a different topic for your output if you simply want to access the data in your application.

You could define a KTable upon the original topic, and then use interactive queries to access the state store that backs the KTable:

This is a sketch on how you would define the KTable:

KTable<String, Person> persons = Streamsbuilder.stream("my_topic").filter(p -> p.provider_id == 3).toTable()

You will have to configure the correct serializers and deserializers in your query above. So you will need to implement a PersonDeserializer and also a PersonSerializerClass for writing the data to the state store.

More information on how to use interactive queries to access the state store is found here:

Regarding your plan to reread the data every few hours, this seems like an anti-patter to me. Kafka topics are meant to be read from the beginning to the end, and only in rare circumstances you should read the data multiple times with the same consumer. Rather than rereading the data, you should build up a materialized view on the data (as shown with the KTable above), and then query that materialized view.

Hope this helps.

Upvotes: 1

Related Questions