Manuel
Manuel

Reputation: 591

TypeError: sequence item 0: expected str instance, tuple found while printing beam.PCollection

I'm trying to print a beam.PCollection and I'm a bit puzzled why it does not work.

I have one collections consisting out of 2 elements with 2 different keys (col1):

('Key2',
 ['file:///Users/DiesDas.txt'])
('Key3',
 ['file:///Users/DiesDas2.txt'])

which I can print using beam.Map(print)

And a second one (col2):

('Key2',
 ['DiesDas1.csv'])

('Key3',
 ['DiesDas2.csv'])

Which I can also print

Calling

grouped = (
    col1,
    col2,
) >> beam.CoGroupByKey()

works fine, however

grouped = (
    col1,
    col2,
) >> beam.CoGroupByKey() | "print" >> beam.Map(print)

Fails with TypeError: sequence item 0: expected str instance, tuple found

Any ideas what could be the issue or how to best debug this?

Upvotes: 0

Views: 1167

Answers (2)

CaptainNabla
CaptainNabla

Reputation: 1166

You are missing a set of brackets in your last statement. Executing this on play.beam.apache.org works as expected:

import apache_beam as beam
with beam.Pipeline() as pipeline:
  col1 = (
    pipeline
    | 'Create col1' >> beam.Create([
      ('Key2', ['file:///Users/DiesDas.txt']),
      ('Key3', ['file:///Users/DiesDas2.txt'])
    ])
  )

  col2 = (
    pipeline
    | 'Create col2' >> beam.Create([
      ('Key2', ['DiesDas1.csv']),
      ('Key3', ['DiesDas2.csv'])
    ])
  )

  grouped = (  # missing opening bracket
    (col1, col2)
    | "merge" >> beam.CoGroupByKey()
    | "print" >> beam.Map(print)
  )  # missing closing bracket

Upvotes: 1

Lanre
Lanre

Reputation: 330

TypeError explains the issue actually, beaMap requires you to string but you are suppyling tuple. Would suggest checking library docs for allowed inputs.

Upvotes: 0

Related Questions