Amit Rai
Amit Rai

Reputation: 89

Data is not flowing in PcollectionView stage in Java Apache beam

After reading data from Bigquery I have to send the data as side Input to next level.So below is the step I am following -

  1. Read BQ
  2. Convert PCollection to PCollection<KV<K,V>>
  3. Converting Pcollection to PcollectionView (Since next step with accept the PcollectionView only as side input)

But Data is not flowing to step 3.Due to this I am not able to send the data in next level where our side Input will go.

Below is the Dataflow -

enter image description here

Below is the code snippet with Java -

PCollection<TableRow> bq_row = pipeline.apply(
    "Read from BigQuery query",
    BigQueryIO.readTableRows().withoutValidation().withTemplateCompatibility().fromQuery(query).usingStandardSql());
    
PCollection<KV<String, TableRow>> rowsKeyedByUser = bq_row
    .apply(WithKeys.of(new SerializableFunction<TableRow, String>() {
        @Override
        public String apply(TableRow row) {
            return (String) row.get("asset_id");
        }
    }));
    
PCollectionView<Map<String, TableRow>> bq_row1 = rowsKeyedByUser.apply(
    "viewTags", View.<String, TableRow>asMap());

Please let us know , Is there anything missing from my end ?

Upvotes: 1

Views: 578

Answers (1)

robertwb
robertwb

Reputation: 5104

View operations don't really have PCollection outputs, they're just used to wire up side inputs. Have you confirmed that the side input in question is not getting any data?

If this is a streaming pipeline, do you have your windowing set up correctly? If not the side input must be computed in its entirety before the DoFn consuming it will emit anything. In particular side inputs are not available until the watermark passes the end of the corresponding window (aka we know the window has everything it will have). For the global window, that is essentially the end of time. Triggers can be used if one needs to access values of a window before it closes.

Upvotes: 1

Related Questions