Reputation: 2113
I execute my program with a dop > 1 but I do not want multiple output files. In Java myDataSet.writeAsText(outputFilePath, WriteMode.OVERWRITE).setParallelism(1);
is working as expected.
But when I try the same in Python it does not work. This is my code: myDataSet.write_text(output_file, write_mode=WriteMode.OVERWRITE).set_degree_of_parallelism(1)
Is there a possibilty to achieve this behaviour in Python?
Upvotes: 1
Views: 450
Reputation: 901
For users who are unaware, Apache Flink added this feature couple of months back.
Here is the short doc from Flink :-
The default parallelism can be overwriten for an entire job by calling setParallelism(int parallelism) on the ExecutionEnvironment or by passing -p to the Flink Command-line frontend. It can be overwritten for single transformations by calling setParallelism(int parallelism) on an operator.
Upvotes: 1
Reputation: 1280
This is not a bug but an unsupported feature. It is currently not possible to set the parallelism for a single operator, but only the complete job.
I have opened a JIRA for this: https://issues.apache.org/jira/browse/FLINK-3275
Upvotes: 5