zhangshengxiong
zhangshengxiong

Reputation: 383

How to Reference the External Jar in Flink

everyone. I tried to reference my company jar in Flink in the way of copying it to $FLINK/lib in all of taskmanagers, but failed. And I don't want to package a fat jar, which is too heavy and waste of time. I think the first method is also not a good idea, because I have to manager jars in the whole cluster. Anyone kowns how to resolve this problem? Any suggestion would be appreciated.

Upvotes: 11

Views: 6841

Answers (3)

user6541820
user6541820

Reputation:

If you want to avoid dependency conflict, don't copy your jars to ${FLINK}/lib. If you use yarn-cluster as your master, you can utilize -yt(--yarn-ship), it will copy jars onto hdfs and as your distributed program classpath.

Upvotes: 1

user2108278
user2108278

Reputation: 401

Flink's Command Line Interface (CLI) allows passing additional jar location paths using the -C option. We use it to pass dependencies to each job.

Our problem: Given that usually our jobs evolve during the whole project lifetime and that their external dependencies change their versions and that we run several processes in the same cluster, we wanted to select the exact jar versions to load in each run. Therefore, the $FLINK/lib directory was not enough for us.

Details: What we do is to distribute the jars to a fixed directory (different from $FLINK/lib) on every node. Later we use the CLI to start the job (not directly as the call is quite long, but using a bash script to abbreviate the call).

Upvotes: 1

Matthias J. Sax
Matthias J. Sax

Reputation: 62330

In general, building a fat jar is the best way to go. Not sure how big your far jar gets, that you thinks it is "too heavy"?

Copying jars to $FLINK/lib should work. However, you need to restart Flink such that the jars are added to Flink's classpath. Thus, this approach does not allow to dynamically add jars -- it should work for a bunch of stable jars however.

In order to manage jars in the whole cluster, it might be helpful to use a NFS folder as $FLINK/lib to keep all TaskManagers in sync. Or you simple write a bash script to distribute your jars.

Upvotes: 14

Related Questions