Filipe Aleixo
Filipe Aleixo

Reputation: 4244

Executing awk command from python

I am trying to execute the following awk command from a python script

awk 'BEGIN {FS="\t"}; {print $1"\t"$2}' file_a > file_b

For this, I tried to use subprocess as follows:

subprocess.check_output(["awk", 'BEGIN {FS="\t"}; {print $1"\t"$2}',
                         file_a, ">",
                         file_b])

where file_a and file_b are strings pointing to the path of the files.

From this, I am getting the error

awk: cannot open > (No such file or directory)

I'm sure I'm inputing the arguments to subprocess in a wrong way, but I can't figure out what's wrong.

Upvotes: 0

Views: 2088

Answers (1)

obskyr
obskyr

Reputation: 1459

While it may look like it in your shell of choice, >, <, and | are not actually passed as arguments to the program you run. Rather, they're a special part of the shell that the program never gets to see.

Since they're part of the shell, and not part of the OS or program, you have to emulate their effects yourself with the normal facilities the language gives you. In your case, since you're trying to pipe to a file, simply use Python's open() as you would normally. The subprocess API supports arguments to specify stdout, stdin, and stderr, and you can supply any file object for those.

Check it out:

with open(file_b, 'wb') as f:
    subprocess.call(["awk", 'BEGIN {FS="\t"}; {print $1"\t"$2}', file_a], stdout=f)

Since subprocess.check_output redirects output already, it doesn't take the stdout argument. Using subprocess.call avoids this. If you also need the output later in the script, you can instead assign the return value of check_output to a variable, and then save that to file_b.

If you use a lot of shell commands, you might also want to check out Plumbum, which gives you a large set of fairly silly shell-like operator overloads.

Upvotes: 2

Related Questions