paresh Bapna
paresh Bapna

Reputation: 77

Spark program is internally calling HDFS: /apps/hive/warehouse

Scenario/Code Details


I am creating a spark session object to store data into hive table, as:

_sparkSession = SparkSession.builder().
                    config(_sparkConf).
                    config("spark.sql.warehouse.dir", "/user/platform").
                    enableHiveSupport().
                    getOrCreate();

After deploying my JAR to the server, I get below exception:

Caused by: org.apache.spark.sql.AnalysisException:
org.apache.hadoop.hive.ql.metadata.HiveException:
MetaException(message:org.apache.hadoop.security.AccessControlException:
Permission denied: user=diplatform, access=EXECUTE,
inode="/apps/hive/warehouse":hdfs:hdfs:d---------
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:353)

In My hive-site.xml I gave the configurationsbelow. We are adding this xml to our spark code so that default xml at /etc/hive/conf could be overriden:

<property>
  <name>hive.security.metastore.authenticator.manager</name>
  <value>org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator</value>
</property>

<property>
  <name>hive.security.metastore.authorization.auth.reads</name>
  <value>false</value>
</property>

<property>
  <name>hive.security.metastore.authorization.manager</name>
  <value>org.apache.hadoop.hive.ql.security.authorization.DefaultHiveMetastoreAuthorizationProvider</value>
</property>

<property>
  <name>hive.metastore.authorization.storage.checks</name>
  <value>false</value>
</property>

 <property>
  <name>hive.metastore.cache.pinobjtypes</name>
  <value>Table,Database,Type,FieldSchema,Order</value>
</property>

    <property>
  <name>hive.metastore.client.connect.retry.delay</name>
  <value>5s</value>
</property>

<property>
  <name>hive.metastore.client.socket.timeout</name>
  <value>1800s</value>
</property>

<property>
  <name>hive.metastore.connect.retries</name>
  <value>24</value>
</property>

 <property>
  <name>hive.metastore.execute.setugi</name>
  <value>true</value>
</property>

 <property>
  <name>hive.metastore.failure.retries</name>
  <value>24</value>
</property>

<property>
  <name>hive.metastore.kerberos.keytab.file</name>
  <value>/etc/security/keytabs/hive.service.keytab</value>
</property>

<property>
  <name>hive.metastore.kerberos.principal</name>
  <value>hive/[email protected]</value>
</property>

<property>
  <name>hive.metastore.pre.event.listeners</name>
  <value>org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener</value>
</property>

<property>
  <name>hive.metastore.sasl.enabled</name>
  <value>true</value>
</property>

<property>
  <name>hive.metastore.server.max.threads</name>
  <value>100000</value>
</property>

<property>
  <name>hive.metastore.uris</name>
  <value>thrift://masternode1.com:9083</value>
</property>

<property>
  <name>hive.metastore.warehouse.dir</name>
  <value>/user/platform</value>
</property>

Questions:


  1. The whole development team is now not sure why and from where this path: /apps/hive/warehouse is being taken from, even after overriding our custom hive-site.xml?

  2. Is it that internal HDFS framework calls this location to store its intermediate results and it requires execute permission to this path?

As per policy we cannot provide 777 level access at /apps/hive/warehouse to users because of two reasons:

There is possibility that in future there would be other set of different users. It is not safe to provide 777 to users at warehouse.

  1. Are the above two reasons correct or is there some workaround?

Upvotes: 1

Views: 1285

Answers (2)

OneCricketeer
OneCricketeer

Reputation: 191743

The Hive metastore has its own XML file that determines where Hive tables are located on HDFS. This property is determined by HiveServer, not Spark

For example, on a Hortonworks cluster, notice that the warehouse is 777 permissions and owned by the hive user and hdfs superuser group.

$ hdfs dfs -ls /apps/hive
Found 2 items
drwxrwxrwx   - hive hadoop          0 2018-02-27 20:20 /apps/hive/auxlib
drwxrwxrwx   - hive hdfs            0 2018-06-27 10:27 /apps/hive/warehouse

According to your error, that directory exists, but no user can read, write or list the contents of that warehouse directory.

Ideally, I would suggest not putting the warehouse in an HDFS user directory.

Upvotes: 2

vaquar khan
vaquar khan

Reputation: 11449

Seems like permission issue on HDFS with user "diplatform".

Login with admin user and perform the following operations

hadoop fs -mkdir -p /apps/hive/warehouse
hadoop fs -mkdir /tmp
hadoop fs -chmod -R 777 /user/hive
hadoop fs -chmod 777 /tmp

Then after create database statement from "diplatform".

Upvotes: 0

Related Questions