Robin Ellerkmann
Robin Ellerkmann

Reputation: 2113

Set degree of parallelism for a single operation in Python

I execute my program with a dop > 1 but I do not want multiple output files. In Java myDataSet.writeAsText(outputFilePath, WriteMode.OVERWRITE).setParallelism(1);is working as expected.

But when I try the same in Python it does not work. This is my code: myDataSet.write_text(output_file, write_mode=WriteMode.OVERWRITE).set_degree_of_parallelism(1)

Is there a possibilty to achieve this behaviour in Python?

Upvotes: 1

Views: 450

Answers (2)

Ravikumar
Ravikumar

Reputation: 901

For users who are unaware, Apache Flink added this feature couple of months back.

Here is the short doc from Flink :-

The default parallelism can be overwriten for an entire job by calling setParallelism(int parallelism) on the ExecutionEnvironment or by passing -p to the Flink Command-line frontend. It can be overwritten for single transformations by calling setParallelism(int parallelism) on an operator.

Upvotes: 1

Chesnay Schepler
Chesnay Schepler

Reputation: 1280

This is not a bug but an unsupported feature. It is currently not possible to set the parallelism for a single operator, but only the complete job.

I have opened a JIRA for this: https://issues.apache.org/jira/browse/FLINK-3275

Upvotes: 5

Related Questions