Reputation: 11031
I have a PCollection, and I would like to use a ParDo to filter out some elements from it.
Is there a place where I can find an example for this?
Upvotes: 5
Views: 8234
Reputation: 11031
In the Apache Beam Python SDK, there is a Filter transform that receives a lambda, and filters out all elements that return False
. Here is an example:
filtered_collection = (beam.Create([1, 2, 3, 4, 5])
beam.Filter(lambda x: x % 2 == 0))
In this case, filtered_collection
will be a PCollection
that contains 2
, and 4
.
If you want to code this as a DoFn that is passed to a ParDo transform, you would do something like this:
class FilteringDoFn(beam.DoFn):
def process(self, element):
if element % 2 == 0:
yield element
else:
return # Return nothing
and you can apply it like so:
filtered_collection = (beam.Create([1, 2, 3, 4, 5])
beam.ParDo(FilteringDoFn()))
where, like before, filtered_collection
is a PCollection
that contains 2
, and 4
.
Upvotes: 13