Rajesh
Rajesh

Reputation: 151

Hazlecast Jet Cluster. Work load not distributed

I have one huge csv file. I have a Jet cluster with 3 nodes. When the job is submitted only one node processes the entire file. What I want is the each part of work can be distributed. Meaning, how can I optimally use the resources in each of the nodes to get the work done faster.

    Pipeline p = Pipeline.create();

    BatchSource<List<String>> source = Sources.filesBuilder("files")
            .glob("*.csv")
            .build(path -> Files.lines(path).skip(1).map(line -> split(line)));

    p.readFrom(source)
            .map(function1)
            .map(function2)
            .writeTo(Sinks.filesBuilder("out").build());
    instance.newJob(p).join();

Upvotes: 1

Views: 73

Answers (1)

Mike Yawn
Mike Yawn

Reputation: 886

In Jet 4.2 the rebalance() operator was introduced, I think this will do exactly what you need. By default a non-partitioned data source gets processed on a single node, but adding a rebalance() will distribute the work.

See https://jet-start.sh/docs/api/more-transforms#rebalance

The rebalance() would go between the readFrom(source) and map(function1)

Upvotes: 1

Related Questions