Reputation: 13
I'm using VM and my cluster consists of 3 taskmanagers and the master is jobmanager and taskmanager too (4 taskmangers and one job manager) so when I run the jar file it used just one slot and I have 4 slots about one slot for one task manager, I don't know why the framework doesn't use all slots available. and I want to know should the dataset be in each taskmanager?
Upvotes: 1
Views: 683
Reputation: 43419
The answer to your question depends somewhat on which cluster manager you are using (e.g., yarn, mesos, kubernetes, or standalone), but in general Flink does not support autoscaling (yet) and so you need to explicitly configure the desired parallelism. You can do this in the source code for the job, or in flink-conf.yaml, or on the command line. If you don't do this, then your jobs will run with the default parallelism.
As for where you should put the data, your flink application will read data using a source connector. If you are using a filesystem as the data source, then every taskmanager will need to be able to read the data using the same filesystem URI -- which is best accomplished by using a distributed filesystem.
Upvotes: 1