jlim
jlim

Reputation: 1039

Flink writing to Hdfs encrypted zone

I have an encrypted zone hdfs in my current environment. I am using Flink 1.2.0 cluster to write parquet file into hdfs encrypted zone using the Apache Parquet writer version 1.9.0. I have no problem writing to non-encrypted zone, but the moment I wrote to the encrypted zone, I have the following error:

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.UnknownCryptoProtocolVersionException): No crypto protocol versions provided by the client are supported. Client provided: [] NameNode supports: [CryptoProtocolVersion{description='Unknown', version=1, unknownValue=null}, CryptoProtocolVersion{description='Encryption zones', version=2, unknownValue=null}]
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.chooseProtocolVersion(FSNamesystem.java:2538)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2682)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2599)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:595)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:112)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:395)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)

at org.apache.hadoop.ipc.Client.call(Client.java:1406)
at org.apache.hadoop.ipc.Client.call(Client.java:1359)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy10.create(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:245)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy12.create(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1425)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1449)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1374)
at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:390)
at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:386)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:386)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:330)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:907)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:888)
at org.apache.parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:239)
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:273)
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:222)
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:188)
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:158)
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:124)
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:97)
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:71)
at org.apache.parquet.avro.AvroParquetWriter.<init>(AvroParquetWriter.java:54)
at com.dataplatform.integration.flink.sink.BucketParquetWriter.write(BucketParquetWriter.java:160)
at com.dataplatform.integration.flink.sink.BucketParquetSink.invoke(BucketParquetSink.java:347)
at org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:38)
at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:185)
at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:63)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:261)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:656)
at java.lang.Thread.run(Thread.java:745)

I have read the Hadoop release and I came to know that Hadoop TDE (Transparent Data Encryption) release in version 2.6.0 onwards. But Flink compiles with Hadoop version 2.3.0. Will this causes any incompatibility issue as far as TDE is concerned ?

Upvotes: 1

Views: 383

Answers (1)

mbalassi
mbalassi

Reputation: 191

You can download Flink prebuilt with different Hadoop versions for example 2.6.0 or you can build Flink yourself choosing the hadoop version via passing -Dhadoop.version=2.6.0 to Maven.

Upvotes: 1

Related Questions