Icarus
Icarus

Reputation: 1463

Set Hive priory for individual query/ connection

Our team uses python to execute hive queries. However, a heavy query always blocks other light-weight queries and has to wait more than an hour.

Is it possible to set the priority or vcpu resources for an individual connection?

Is setting the "yarn.nodemanager.resource.cpu-vcores" or "mapred.job.priority" in the configuration a solution?

configuration = {
    "mapred.job.priority": 'LOW',
    "yarn.nodemanager.resource.cpu-vcores": 2
}
# configuration={}

con = hive.connect(ip, port=10000, auth=auth, kerberos_service_name='hive', database=db_name, configuration=configuration)

If yes, how can I fix the It is not in list of params that are allowed to be modified at runtime error?

Thanks

Upvotes: 2

Views: 715

Answers (1)

Cloudkollektiv
Cloudkollektiv

Reputation: 14729

Since you are directly connecting to Hive, it seems that the value of hive.security.authorization.sqlstd.confwhitelist.append in your settings is incorrect or to strict and does not allow your variables to be set at runtime.

The solution is described here here. An example hiveserver2.xml values with values in regex:

<property>
    <name>hive.security.authorization.sqlstd.confwhitelist.append</name>
    <value>mapred.*|hive.*|mapreduce.*|spark.*</value>
</property>
<property>
    <name>hive.security.authorization.sqlstd.confwhitelist</name>
    <value>mapred.*|hive.*|mapreduce.*|spark.*</value>
</property>

Upvotes: 2

Related Questions