Reputation: 1463
Our team uses python to execute hive queries. However, a heavy query always blocks other light-weight queries and has to wait more than an hour.
Is it possible to set the priority or vcpu resources for an individual connection?
Is setting the "yarn.nodemanager.resource.cpu-vcores
" or "mapred.job.priority
" in the configuration a solution?
configuration = {
"mapred.job.priority": 'LOW',
"yarn.nodemanager.resource.cpu-vcores": 2
}
# configuration={}
con = hive.connect(ip, port=10000, auth=auth, kerberos_service_name='hive', database=db_name, configuration=configuration)
If yes, how can I fix the It is not in list of params that are allowed to be modified at runtime
error?
Thanks
Upvotes: 2
Views: 715
Reputation: 14729
Since you are directly connecting to Hive, it seems that the value of hive.security.authorization.sqlstd.confwhitelist.append
in your settings is incorrect or to strict and does not allow your variables to be set at runtime.
The solution is described here here. An example hiveserver2.xml values with values in regex:
<property>
<name>hive.security.authorization.sqlstd.confwhitelist.append</name>
<value>mapred.*|hive.*|mapreduce.*|spark.*</value>
</property>
<property>
<name>hive.security.authorization.sqlstd.confwhitelist</name>
<value>mapred.*|hive.*|mapreduce.*|spark.*</value>
</property>
Upvotes: 2