Reputation: 591
I'm trying to print a beam.PCollection and I'm a bit puzzled why it does not work.
I have one collections consisting out of 2 elements with 2 different keys (col1):
('Key2',
['file:///Users/DiesDas.txt'])
('Key3',
['file:///Users/DiesDas2.txt'])
which I can print using beam.Map(print)
And a second one (col2):
('Key2',
['DiesDas1.csv'])
('Key3',
['DiesDas2.csv'])
Which I can also print
Calling
grouped = (
col1,
col2,
) >> beam.CoGroupByKey()
works fine, however
grouped = (
col1,
col2,
) >> beam.CoGroupByKey() | "print" >> beam.Map(print)
Fails with TypeError: sequence item 0: expected str instance, tuple found
Any ideas what could be the issue or how to best debug this?
Upvotes: 0
Views: 1167
Reputation: 1166
You are missing a set of brackets in your last statement. Executing this on play.beam.apache.org works as expected:
import apache_beam as beam
with beam.Pipeline() as pipeline:
col1 = (
pipeline
| 'Create col1' >> beam.Create([
('Key2', ['file:///Users/DiesDas.txt']),
('Key3', ['file:///Users/DiesDas2.txt'])
])
)
col2 = (
pipeline
| 'Create col2' >> beam.Create([
('Key2', ['DiesDas1.csv']),
('Key3', ['DiesDas2.csv'])
])
)
grouped = ( # missing opening bracket
(col1, col2)
| "merge" >> beam.CoGroupByKey()
| "print" >> beam.Map(print)
) # missing closing bracket
Upvotes: 1
Reputation: 330
TypeError explains the issue actually, beaMap requires you to string but you are suppyling tuple. Would suggest checking library docs for allowed inputs.
Upvotes: 0