Reputation: 184
I am trying to query parquet files present on a hdfs cluster via Apache Drill(distributed mode). I have created a new storage plugin named 'hdfs' which contains the following configuration:
{
"type": "file",
"enabled": true,
"connection": "hdfs://<my-name-node-host>:8020",
"config": null,
"workspaces": {
"root": {
"location": "/",
"writable": true,
"defaultInputFormat": null
}
},
"formats": {
"json": {
"type": "json",
"extensions": [
"json"
]
},
"parquet": {
"type": "parquet"
}
}
}
In the hadoop fs I have the sample file region.parquet in /user/tj/
folder. It has the owner and group as hdfs:hdfs
by default and i would like to keep it so.
But when i try to query it from Apache drill UI via following sql query :
SELECT * FROM hdfs
./user/tj/region.parquet
It throws the exception as below :
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: RemoteException: Permission denied: user=, access=EXECUTE, inode="/user/tj/region.parquet/.drill.parquet_metadata":hdfs:hdfs:-rw-r--r-- at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1827) at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getFileInfo(FSDirStatAndListingOp.java:108) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3972) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1130) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:851) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307) [Error Id: 24bf0cf0-0181-4c72-97ee-4b4eb98771bf on :31010]
How do I fix this permission issue to query the hadoop cluster files using apache drill ?
How to execute the query as a hdfs user?
Upvotes: 0
Views: 553
Reputation: 1054
I think you have configure the user impersonation.You can follow the below link to give the view permission for the apache drill. I actually didn't use the apache drill so please update in comment if it works fine.
Configure user impersonation link
Upvotes: 0