How do you test Beam pipeline (Google Dataflow) in Python?

Question

I am having understanding how we are supposed to test our pipeline using Google DataFlow(based on Apache Beam) Python SDK.

https://beam.apache.org/documentation/pipelines/test-your-pipeline/ https://cloud.google.com/dataflow/pipelines/creating-a-pipeline-beam

The above link is ONLY for Java. I am pretty confused as to why Google will point to Java Apache testing.

I want to be able to view the results of a CoGroupByKey join on two p collections. I am coming from a Python background, and I have little to no experience using Beam/Dataflow.

Could really use any help. I know this is open ended to an extent.. basically I need to be able to view results within my pipeline and it's preventing me from seeing the results of my CoGroupByKey Join.

Code Below

    #dwsku, product are PCollections coming from BigQuery. Nested Values as 
    #well in Product, but not dwsku
    d1 = {'dwsku': dwsku, 'product': product}
    results = d1 | beam.CoGroupByKey()
    print results

What is printed:

    PCollection[CoGroupByKey/Map(_merge_tagged_vals_under_key).None]

How do you test Beam pipeline (Google Dataflow) in Python?

Answers (1)

Related Questions