MikA
MikA

Reputation: 5542

executing long running hive queries from remote machine

I've to execute long running (~10 hours) hive queries from my local server using a python script. my target hive server is in an aws cluster.

I've tried to execute it using pyhs2, execute('<command>')

and

paramiko, exec_command('hive -e "<command>"')

in both cases my query will be running in hive server and will complete successfully. but issue is even after successfully completing the query my parent python script continue to wait for return value and will remain in Interruptible sleep (Sl) state for infinite time!

is there anyway I can make my script work fine using pyhs2 or paramiko? os is there any other better option available in python?

Upvotes: 3

Views: 939

Answers (1)

Ajith Sasidharan
Ajith Sasidharan

Reputation: 1155

As i mentioned before that even I face a similar issue in my Performance based environment. My use-case was i was using PYHS2 module to run queries using HIVE TEZ execution engine. TEZ generates lot of logs(basically in seconds scale). the logs gets captured in STDOUT variable and is provided to the output once the query successfully completes. The way to overcome is to stream the output as an when it is generated as shown below:

    for line in iter(lambda: stdout.readline(2048), ""):
    print line

But for this you will have to use native connection to cluster using PARAMIKO or FABRIC and then issue hive command via CLI or beeline.

Upvotes: 0

Related Questions