Write a particular PCollection to BigQuery

Question

Suppose I create two output PCollections as a result of SideOutputs and depending on some condition I want to write only one of them to BigQuery. How to do this?

Basically my use case is that I'm trying to make Write_Append and Write_Truncate dynamic. I fetch the information(append/truncate) from a config table that I maintain in BigQuery. So depending on what I have in the config table I must apply Truncate or Append.

So using SideOutputs I was able to create two PCollections(Append and Truncate respectively) out of which one will be empty. And the one which has all the rows must be written to BigQuery. Is this approach correct?

The code that i'm using:

 final TupleTag truncate =
                  new TupleTag(){};
              // Output that contains word lengths.
              final TupleTag append =
                  new TupleTag(){};

              PCollectionTuple results = read.apply("convert to table row",ParDo.of(new DoFn(){
              @ProcessElement
              public void processElement(ProcessContext c)
              {
                  String value = c.sideInput(configView).get(0).toString();
                  LOG.info("config: "+value);
                  if(value.equals("truncate")){
                      LOG.info("outputting to truncate");
                      c.output(new TableRow().set("color", c.element()));
                  }
                  else
                  {
                      LOG.info("outputting to append");
                      c.output(append,new TableRow().set("color", c.element()));
                  }
                  //c.output(new TableRow().set("color", c.element()));
              }
          }).withSideInputs(configView).withOutputTags(truncate,
                  TupleTagList.of(append)));

              results.get(truncate).apply("truncate",BigQueryIO.writeTableRows()
                        .to("projectid:datasetid.tableid")
                        .withSchema(schema)
                        .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE)
                        .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED));

              results.get(append).apply("append",BigQueryIO.writeTableRows()
                        .to("projectid:datasetid.tableid")
                        .withSchema(schema)
                        .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
                        .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED));

I need to perform one out of the two. If I do both table is going to get truncated anyways.

P.S. I'm using Java SDK (Apache Beam 2.1)

Write a particular PCollection to BigQuery

Answers (1)

Related Questions