Reputation: 8450
I'm trying to get a sample of the items in PCollection
using the Python SDK on Dataflow / Beam.
While it's not documented, Sample.FixedSizeGlobally(n)
exists.
When testing, it seems to return a PCollection
with a single item: a list containing the samples, rather than a PCollection
with the samples. Is that correct?
Is doing this the best way of turning that single-item PCollection
into a PCollection
of the items?
| Sample.FixedSizeGlobally(sample_size)
| beam.FlatMap(lambda x: x)
Upvotes: 2
Views: 1561
Reputation: 11021
Currently, yes. The Sample.FixedSizeGlobally()
transform returns a PCollection with a single list element. You can turn it into a PCollection of single elements like you said:
Sample.FixedSizeGlobally(sample_size)
| beam.FlatMap(lambda x: x)
We'll make sure to add a PC-PC transform - and we also welcome your contributions to Beam : ) - But in the meantime, that's what we've got.
Upvotes: 4