Conditional statement Python Apache Beam pipeline

Question

Current situation

The porpouse of this pipeline is to read from pub/sub the payload with geodata, then this data are transformed and analyzed and finally return if a condition is true or false

 with beam.Pipeline(options=pipeline_options) as p:
        raw_data = (p
                    | 'Read from PubSub' >> beam.io.ReadFromPubSub(
                    subscription='projects/XXX/subscriptions/YYY'))

        geo_data = (raw_data
                    | 'Geo data transform' >> beam.Map(lambda s: GeoDataIngestion(s)))
                    
                    

def GeoDataIngestion(string_input):
    <...>
    return True or False

Desirable situation 1

If the GeoDataIngestion result is true, then the raw_data will be stored in big query

geo_data = (raw_data
                | 'Geo data transform' >> beam.Map(lambda s: GeoDataIngestion(s))
                | 'Evaluate condition' >> beam.Map(lambda s: Condition(s))
                )

def Condition(condition):
    if condition:
        <...WriteToBigQuery...>


#The class I used before to store raw_data without depending on evaluate condition:

class WriteToBigQuery(beam.PTransform):
    def expand(self, pcoll):
        return (
                pcoll
                | 'Format' >> beam.ParDo(FormatBigQueryFn())
                | 'Write to BigQuery' >> beam.io.WriteToBigQuery(
            'XXX',
            schema=TABLE_SCHEMA,
            create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
            write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND))

Desirable situation 2

Instead of store the data in BigQuery, it would be also good to send to pub/sub

def Condition(condition):
    if condition:
        <...SendToPubSub(Topic1)...>
    else:
        <...SendToPubSub(Topic2)...>

Here, the problem is to set the Topic depending of the condition result, because i'm not able to pass the topic like parameter in the pipeline

 | beam.io.WriteStringsToPubSub(TOPIC)

Neither in a function/class

Question

How can I do that?

How/where should I call WriteToBigQuery to store the PCollection raw_data if the result of Evaluate condition is true?

Conditional statement Python Apache Beam pipeline

Answers (1)

Related Questions