Renganathan Rajagopal
Renganathan Rajagopal

Reputation: 15

How to run a subprocess and store the results in a file?

I am trying to run a hive/spark submit from python using subprocess module. I am trying to write the data output to a file (log file). cCn you please help me in this?

import subprocess

file = ["hive" "-f" "test.sql"]

process = subprocess.Popen(file,shell=False,stderr=subprocess.PIPE,
                           stdout=subprocess.STDOUT,universal_newlines=True)
process.wait()
out,err=process.communicate()

The out file I need to write it to new file let say test.log/test.txt file.

Upvotes: 0

Views: 372

Answers (1)

tripleee
tripleee

Reputation: 189477

You have an error in your command; the list needs to have commas between the strings (otherwise you are pasting together the individual strings to a single long string "hive-ftest.sql"!)

As pointed out in the subprocess documentation, you should generally avoid bare Popen when you can. If all you need is for a command to run to completion, subprocess.run or its legacy siblings check_call et al. should be preferred for simplicity and robustness.

import subprocess

# Renamed the variable; this is not a "file" by any stretch
cmd = ["hive", "-f", "test.sql"]
with open(filename, "wb") as outputfile:
    process = subprocess.run(cmd, stdout=outputfile, check=True)

Specifying a binary output mode avoids having Python try to infer anything about the encoding of the bytes emitted; if you need to process text, you might want to add an encoding= keyword argument to the subprocess call.

Not specifying any destination for stderr means error messages will be displayed to the user, which is probably a useful simplification if the tool will be invoked interactively. If not, you will probably need to capture any diagnostic messages and display them in a log file or something.

check=True specifies that Python should check that the command succeeds, and raise an exception if not. This is usually good hygiene, but might need to be tweaked if the command you run could emit an error status in situations where your use case could nevertheless be completed, or if you need to avoid tracebacks in unattended use.

shell=False is the default, and so I omitted that.

I can see no reason to store the command in a variable, but perhaps you have one. Inlining the command will avoid having to come up with a useful name for the variable (^:

Upvotes: 2

Related Questions