Ehsan Toghian
Ehsan Toghian

Reputation: 548

Execute zgrep command and write results to a file

I have a folder containing lots of files like file_1.gz to file_250.gz and increasing.

A zgrep command which searches through them is like:

zgrep -Pi "\"name\": \"bob\"" ../../LM/DATA/file_*.gz

I want to execute this command in a python subprocess like:

out_file = os.path.join(out_file_path, file_name)
search_command = ['zgrep', '-Pi', '"name": "bob"', '../../LM/DATA/file_*.gz']
process = subprocess.Popen(search_command, stdout=out_file)

The problem is the out_file is created but it is empty and these errors are raised:

<type 'exceptions.AttributeError'>
'str' object has no attribute 'fileno'

What is the solution?

Upvotes: 0

Views: 1935

Answers (4)

jfs
jfs

Reputation: 414265

There are two issues:

  1. you should pass something with a valid .fileno() method instead of the filename
  2. the shell expands * but subprocess does not invoke the shell unless you ask. You could use glob.glob() to expand the file patterns manually.

Example:

#!/usr/bin/env python
import os
from glob import glob
from subprocess import check_call

search_command = ['zgrep', '-Pi', '"name": "bob"'] 
out_path = os.path.join(out_file_path, file_name)
with open(out_path, 'wb', 0) as out_file:
    check_call(search_command + glob('../../LM/DATA/file_*.gz'), 
               stdout=out_file)

Upvotes: 1

Ehsan Toghian
Ehsan Toghian

Reputation: 548

My problem consist of two parts:

  1. First part is answered by @liborm as well
  2. The second part is related to the files that zgrep tries to search in. when we write a command like zgrep "pattern" path/to/files/*.gz the bash automatically removes the *.gz by all files ends with .gz. When i run the command in a subprocess no one replaced the *.gz by real file, in consequence the error gzip: ../../LM/DATA/file_*.gz: No such file or directory raises. So solved it by:

    for file in os.listdir(archive_files_path):
        if file.endswith(".gz"):
            search_command.append(os.path.join(archive_files_path, file))
    

Upvotes: 0

liborm
liborm

Reputation: 2724

You need to pass a file object:

process = subprocess.Popen(search_command, stdout=open(out_file, 'w'))

Citing the manual, emphasis mine:

stdin, stdout and stderr specify the executed program’s standard input, standard output and standard error file handles, respectively. Valid values are PIPE, an existing file descriptor (a positive integer), an existing file object, and None. PIPE indicates that a new pipe to the child should be created. With the default settings of None, no redirection will occur; the child’s file handles will be inherited from the parent.

Combined with LFJ's answer - using the convenience functions is recommended, and you need to use shell=True to make the wildcard (*) work:

subprocess.call(' '.join(search_command), stdout=open(out_file, 'w'), shell=True)

Or when you're using shell anyways, you can use the shell redirection as well:

subprocess.call("%s > %s" % (' '.join(search_command), out_file), shell=True)

Upvotes: 1

Fujiao Liu
Fujiao Liu

Reputation: 2253

if your want to execute a shell command and get the output, try to use subprocess.check_output(). it is very simple, and you could save the output to a file easily.

command_output = subprocess.check_output(your_search_command, shell=True)
with open(out_file, 'a') as f:
    f.write(command_output)

Upvotes: 0

Related Questions