Reputation: 2751
I have a pipeline that uses the batch variant of DoFn
(which the docs weren't very helpful for). It looks like this
class MyFn(beam.DoFn):
def process_batch(self, batch: List[MyType]) -> Iterator[List[MyType]]:
# process batches
results = []
for foo in batch:
# do work, add to results
yield results
I've got some logging setup to shows me that my process_batch
method is operating on 4096
items consistently. Does anyone know why its 4096
, or how to make it higher or lower?
Upvotes: 1
Views: 729
Reputation: 388
Currently the BATCH SIZE is hardcoded to 4096 in the batched DoFns. You can raise a feature request in the apache beam community to make it configurable.
Upvotes: 2