Vibhor Jain
Vibhor Jain

Reputation: 1476

Incorrect 'key' value in Map transform

apache-beam==2.23.0 Python 3.8.5 DirectRunner

In my Map transform, I'm trying to extract Key value for each tuple element (after GroupByKey transform upstream). But the Output is always a string 'KeyParam' instead actual key value

here's minimal code:

pipeline code

p| beam.Create([("2","elem2.1"),("1","elem1.1"),("1","elem1.2")]) \
|"group" >>beam.GroupByKey() \
| "log_PCollection_AfterGrouped" >> beam.Map(myRawProcessor.myReader) \

Map transform code

class myRawProcessor():
    @classmethod
    def myReader(self,e,
              timestamp=beam.DoFn.TimestampParam,
              window=beam.DoFn.WindowParam,
              watermark=beam.DoFn.WatermarkEstimatorParam,
              key=beam.DoFn.KeyParam,
               *args, **kwargs):
        print("=== === ===")
        print(e)
        print(key)
        return e

Output

> === === === 
> ('2', ['elem1.1']) 
> KeyParam -----> EXPECTED :: '2'
> === === === 
> ('1', ['elem1.2', 'elem1.3']) 
> KeyParam ----> EXPECTED :: '1'

Upvotes: 5

Views: 156

Answers (1)

robertwb
robertwb

Reputation: 5104

This is a bug, see BEAM-10780. In the meantime, avoid using DoFn.KeyParam in this context.

Upvotes: 1

Related Questions