Mustafa
Mustafa

Reputation: 10413

Apache Flink using with Hadoop 2.8.0 for S3A Path Style Access

I am trying to use S3 backend with custom endpoint. However, it is not supported in [email protected], I need to use at least 2.8.0 version. The underyling reason is that the requests are being sent as following

DEBUG [main] (AmazonHttpClient.java:337) - Sending Request: HEAD http://mustafa.localhost:9000 / Headers: 

Because fs.s3a.path.style.acces" is not recognized in old version. I want the domain to remain same, the bucket name to be appended in the path (http://localhost:9000/mustafa/...)

I cannot blindly increase aws-java-sdk version to latest, it causes:

Caused by: java.lang.NoClassDefFoundError: Could not initialize class com.amazonaws.ClientConfiguration
    at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:182)

So, If I increase the hadoop-aws to 2.8.0 with latest client, it causes the following error:

According to, I need [email protected] and https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/aws.html#provide-s3-filesystem-dependency

Caused by: java.lang.IllegalAccessError: tried to access method org.apache.hadoop.metrics2.lib.MutableCounterLong.<init>(Lorg/apache/hadoop/metrics2/MetricsInfo;J)V from class org.apache.hadoop.fs.s3a.S3AInstrumentation
    at org.apache.hadoop.fs.s3a.S3AInstrumentation.streamCounter(S3AInstrumentation.java:194)

Should I be excluding hadoop-common from Flink somehow? Building flink from source with mvn clean install -DskipTests -Dhadoop.version=2.8.0 works but I want to manage it via maven as much as possible.

Upvotes: 0

Views: 2395

Answers (1)

stevel
stevel

Reputation: 13430

  1. Don't try and mix Hadoop JARs, it won't work and all support JIRAs will be rejected.
  2. In maven you could try excluding the Hadoop 2.7 dependencies from your flink import, and then explicitly pull in hadoop-client, hadoop-aws, ... I don't have the flink setup, but here is one for Spark designed to let me mix in Hadoop 3.0 beta builds with Spark 2.2, excluding from Spark the hadoop stuff, and from hadoop all the jackson and jetty bits. Yes, it hurts, but that's the only way I've been able to completely control what I end up with.
  3. No idea about flink-snapshot, it'll depend on what it was built with. Ask on the mailing list

Upvotes: 2

Related Questions