Avinash
Avinash

Reputation: 603

Flink Shaded Hadoop S3 Filesystems still requires hdfs-default and hdfs-site config paths

I'm trying to configure S3 as my state backend with Flink 1.6.0.

flink-conf.yaml
state.backend: filesystem
state.checkpoints.dir: s3://***/flink-checkpoints
state.savepoints.dir: s3://***/flink-savepoints

s3.access-key: *******
s3.secret-key: *******

https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/aws.html#shaded-hadooppresto-s3-file-systems-recommended

I'have moved the flink-s3-fs-hadoop-1.6.0.jar to the lib directory. The docs don't specify any need for hadoop configuration files for this particular method. Yet i'm facing this error complaining about missing hadoop configuration path.

2018-08-24 23:25:17,829 INFO org.apache.flink.streaming.runtime.tasks.StreamTask           - State backend is set to heap memory (checkpoints to filesystem "s3://***/flink-checkpoints")
2018-08-24 23:25:17,831 DEBUG org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.fs.hdfs.AbstractFileSystemFactory  - Creating Hadoop file system (backed by Hadoop s3a file system)
2018-08-24 23:25:17,831 DEBUG org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.fs.hdfs.AbstractFileSystemFactory  - Loading Hadoop configuration for Hadoop s3a file system
2018-08-24 23:25:17,872 DEBUG org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.util.HadoopUtils  - Cannot find hdfs-default configuration-file path in Flink config.
2018-08-24 23:25:17,873 DEBUG org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.util.HadoopUtils  - Cannot find hdfs-site configuration-file path in Flink config.
2018-08-24 23:25:17,873 DEBUG org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.util.HadoopUtils  - Could not find Hadoop configuration via any of the supported methods (Flink configuration, environment variables).
2018-08-24 23:25:17,878 INFO  org.apache.flink.runtime.taskmanager.Task                     - Source: Custom Source -> Map -> Sink: Print to Std. Out (1/1) (ee0eeb00ea0f01043d90f6b8d3c0cc2e) switched from RUNNING to FAILED.
javax.xml.parsers.FactoryConfigurationError: Provider for class javax.xml.parsers.DocumentBuilderFactory cannot be created
        at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:311)
        at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:267)
        at javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:120)
        at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2565)
        at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2541)
        at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2424)
        at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.set(Configuration.java:1149)
        at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.set(Configuration.java:1121)
        at org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.fs.hdfs.HadoopConfigLoader.loadHadoopConfigFromFlink(HadoopConfigLoader.java:101)
        at org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.fs.hdfs.HadoopConfigLoader.getOrLoadHadoopConfig(HadoopConfigLoader.java:80)
        at org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.fs.hdfs.AbstractFileSystemFactory.create(AbstractFileSystemFactory.java:55)
        at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:395)
        at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:318)
        at org.apache.flink.core.fs.Path.getFileSystem(Path.java:298)
        at org.apache.flink.runtime.state.filesystem.FsCheckpointStorage.<init>(FsCheckpointStorage.java:61)
        at org.apache.flink.runtime.state.filesystem.FsStateBackend.createCheckpointStorage(FsStateBackend.java:443)
        at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:257)
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
        at java.lang.Thread.run(Thread.java:748)

Am I missing something here? Any help is appreciated.

Upvotes: 0

Views: 1413

Answers (1)

Avinash
Avinash

Reputation: 603

Screwed my dependencies and that's what caused this unrelated exception. Was trying out Bucketing and Rolling Sink connectors which required Hadoop dependencies. Added them in maven's provided scope and was not able to run them from IntelliJ IDEA. So switched them to compile and left it as it is. They got packed part of the artifact jar and caused this issue.

Lesson Learnt: Never add Hadoop dependencies in default (compile) scope. IntelliJ has an option in the Run Configuration to included dependencies declared in provided scope.

Upvotes: 1

Related Questions