Reputation: 123
We have a kafka stream of events that we want to enrich using some metadata that resides inside MySQL DB.
The metadata changes every few hours. Essentially we want to periodically read the DB and keep enriching the events with this new metadata.
One way could be to use Broadcast state with a periodic Source that reads DB every few minutes/hours. Broadcast this stream and use it to join. But the problem could be that the first read of the broadcast stream can be later than some of the messages being read from Kafka Stream.
Is there any better way?
Upvotes: 0
Views: 696
Reputation: 43697
You could use Flink SQL for this. Depending on the exact requirements, you might either do time-versioned joins against a CDC stream from the MySQL DB, or do lookup joins against MySQL (perhaps with the optional cache enabled).
See also https://github.com/ververica/flink-cdc-connectors.
Update:
If you want to use the DataStream API, but are concerned that some of the kafka messages might be processed before the corresponding data from the broadcast stream is available, you can:
open()
method of your enrichment function, do an initial query against MySQL to preload the metadataopen()
, or use some default value hardwired into the codeAlternatively, you could use the state processor API to bootstrap values for the broadcast state.
Upvotes: 0