Reputation: 1623
I'm trying to understand how works a simple enrichment data process using apache beam.
I've designed a first dummy-diagram but I'm not sure how address this:
I've saw some examples using CoGroupByKey or using lambda but I'm not sure and I feel a little lost on this.
I'm rigth with the approach? Where could I find some examples to understand better?
Thanks a lot!!
Upvotes: 2
Views: 756
Reputation: 1428
It depends on what you are trying to do. If your unbound data and your streaming data have a value in common, I would use CoGroupByKey
. But this does not always work due to the streamed data. If so, you will need to use side inputs, and then you can use the lambda
expression or GroupByKey
to merge the data. You can look at this example of CoGroupByKey
. This is an example of lambda
, and this documentation is really good explaining the functions that you can use with Apache Beam through Python.
Upvotes: 1