Reputation: 15
I am trying to run a hive/spark submit from python using subprocess
module. I am trying to write the data output to a file (log file). cCn you please help me in this?
import subprocess
file = ["hive" "-f" "test.sql"]
process = subprocess.Popen(file,shell=False,stderr=subprocess.PIPE,
stdout=subprocess.STDOUT,universal_newlines=True)
process.wait()
out,err=process.communicate()
The out
file I need to write it to new file let say test.log
/test.txt
file.
Upvotes: 0
Views: 372
Reputation: 189477
You have an error in your command; the list needs to have commas between the strings (otherwise you are pasting together the individual strings to a single long string "hive-ftest.sql"
!)
As pointed out in the subprocess
documentation, you should generally avoid bare Popen
when you can. If all you need is for a command to run to completion, subprocess.run
or its legacy siblings check_call
et al. should be preferred for simplicity and robustness.
import subprocess
# Renamed the variable; this is not a "file" by any stretch
cmd = ["hive", "-f", "test.sql"]
with open(filename, "wb") as outputfile:
process = subprocess.run(cmd, stdout=outputfile, check=True)
Specifying a binary output mode avoids having Python try to infer anything about the encoding of the bytes emitted; if you need to process text, you might want to add an encoding=
keyword argument to the subprocess
call.
Not specifying any destination for stderr
means error messages will be displayed to the user, which is probably a useful simplification if the tool will be invoked interactively. If not, you will probably need to capture any diagnostic messages and display them in a log file or something.
check=True
specifies that Python should check that the command succeeds, and raise an exception if not. This is usually good hygiene, but might need to be tweaked if the command you run could emit an error status in situations where your use case could nevertheless be completed, or if you need to avoid tracebacks in unattended use.
shell=False
is the default, and so I omitted that.
I can see no reason to store the command in a variable, but perhaps you have one. Inlining the command will avoid having to come up with a useful name for the variable (^:
Upvotes: 2