Reputation: 77
I am creating a spark session object to store data into hive table, as:
_sparkSession = SparkSession.builder().
config(_sparkConf).
config("spark.sql.warehouse.dir", "/user/platform").
enableHiveSupport().
getOrCreate();
After deploying my JAR to the server, I get below exception:
Caused by: org.apache.spark.sql.AnalysisException:
org.apache.hadoop.hive.ql.metadata.HiveException:
MetaException(message:org.apache.hadoop.security.AccessControlException:
Permission denied: user=diplatform, access=EXECUTE,
inode="/apps/hive/warehouse":hdfs:hdfs:d---------
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:353)
In My hive-site.xml I gave the configurationsbelow. We are adding this xml to our spark code so that default xml at /etc/hive/conf could be overriden:
<property>
<name>hive.security.metastore.authenticator.manager</name>
<value>org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator</value>
</property>
<property>
<name>hive.security.metastore.authorization.auth.reads</name>
<value>false</value>
</property>
<property>
<name>hive.security.metastore.authorization.manager</name>
<value>org.apache.hadoop.hive.ql.security.authorization.DefaultHiveMetastoreAuthorizationProvider</value>
</property>
<property>
<name>hive.metastore.authorization.storage.checks</name>
<value>false</value>
</property>
<property>
<name>hive.metastore.cache.pinobjtypes</name>
<value>Table,Database,Type,FieldSchema,Order</value>
</property>
<property>
<name>hive.metastore.client.connect.retry.delay</name>
<value>5s</value>
</property>
<property>
<name>hive.metastore.client.socket.timeout</name>
<value>1800s</value>
</property>
<property>
<name>hive.metastore.connect.retries</name>
<value>24</value>
</property>
<property>
<name>hive.metastore.execute.setugi</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.failure.retries</name>
<value>24</value>
</property>
<property>
<name>hive.metastore.kerberos.keytab.file</name>
<value>/etc/security/keytabs/hive.service.keytab</value>
</property>
<property>
<name>hive.metastore.kerberos.principal</name>
<value>hive/[email protected]</value>
</property>
<property>
<name>hive.metastore.pre.event.listeners</name>
<value>org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener</value>
</property>
<property>
<name>hive.metastore.sasl.enabled</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.server.max.threads</name>
<value>100000</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://masternode1.com:9083</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/platform</value>
</property>
The whole development team is now not sure why and from where this path: /apps/hive/warehouse is being taken from, even after overriding our custom hive-site.xml?
Is it that internal HDFS framework calls this location to store its intermediate results and it requires execute permission to this path?
As per policy we cannot provide 777 level access at /apps/hive/warehouse to users because of two reasons:
There is possibility that in future there would be other set of different users. It is not safe to provide 777 to users at warehouse.
Upvotes: 1
Views: 1285
Reputation: 191743
The Hive metastore has its own XML file that determines where Hive tables are located on HDFS. This property is determined by HiveServer, not Spark
For example, on a Hortonworks cluster, notice that the warehouse is 777 permissions and owned by the hive
user and hdfs
superuser group.
$ hdfs dfs -ls /apps/hive
Found 2 items
drwxrwxrwx - hive hadoop 0 2018-02-27 20:20 /apps/hive/auxlib
drwxrwxrwx - hive hdfs 0 2018-06-27 10:27 /apps/hive/warehouse
According to your error, that directory exists, but no user can read, write or list the contents of that warehouse directory.
Ideally, I would suggest not putting the warehouse in an HDFS user directory.
Upvotes: 2
Reputation: 11449
Seems like permission issue on HDFS with user "diplatform".
Login with admin user and perform the following operations
hadoop fs -mkdir -p /apps/hive/warehouse
hadoop fs -mkdir /tmp
hadoop fs -chmod -R 777 /user/hive
hadoop fs -chmod 777 /tmp
Then after create database statement from "diplatform".
Upvotes: 0