sanyi14ka
sanyi14ka

Reputation: 829

Join a rapidly and slowly changing unbounded sources in Apache Beam

I have two unbounded sources (pubsub):

I want to enrich (left join) the main source with the table read based on the secondary source.

I already have a solution in which the big query tables are read at the beginning, thus they are bounded. For the join I used Beam SQL, since it is quite complex and I want to keep it, therefore, I think using side input is not feasible, since I don't think I can join a PCollection with PCollectionView using Beam SQL.

I tried to use a fixed window with 5 seconds on each source, but for the second source the last state is not propagated to the windows where nothing has changed. Therefore after joining the sources I get the right results only when the BigQuery table was updated, but when nothing has changed (most of the time) I get null values on the right side.

How can I upsample the seconds source to get the right results after the join?

Upvotes: 0

Views: 54

Answers (1)

jggp1094
jggp1094

Reputation: 180

I think using fixed-size windows with unbounded sources isn't ideal for this scenario, as you've discovered. The problem is that your secondary source's infrequent updates are lost when they don't fall within a window containing events from the main source. Simple upsampling of the secondary source won't solve this fundamentally, it will just create many redundant copies of the same BigQuery data, increasing processing load without improving accuracy.

You can try using keyed windows based on a common key between your main and secondary sources. This key should be the key identifier relevant to join. Both your Pub/Sub messages from the main and secondary sources need to include this key. If the BigQuery table update affects multiple records, the secondary source message should include all relevant keys. Then

use a global window for the secondary source. This means the secondary source's data will persist until explicitly cleared.

Also, I figured this article might be helpful to you.

Upvotes: 0

Related Questions