user1189851
user1189851

Reputation: 5041

grep/zgrep within python using subprocess

I have a set of tsvs that are zipped in *.tsv.gz format and some that are not zipped, i.e., *.tsv in a directory.

I want to grep for a string from these files and print the grep results each in a new line.

I have a function that looks that takes in the input directory in which tsvs and *.tsv.gz are stored and the string to be searched.

import sys, os, traceback,subprocess,gzip,glob
def filter_from_tsvs(input_dir,string):

    tsvs = glob.glob(os.path.join(input_dir,'*.tsv*'))
    open_cmd=open
    for tsvfile in tsvs:
        print os.path.splitext
        extension = os.path.splitext(tsvfile)[1]
        if extension == ".gz":
          open_cmd = gzip.open
    print open_cmd
    try:
        print subprocess.check_output('grep string tsvfile', shell=True)

    except Exception as e:
        print "%s" %e
        print "%s" %traceback.format_exc()
return

I have also tried to use:

         try:
             fname = open_cmd(tsvfile,"r")
             print "opened"
             print subprocess.check_output('grep string fname', shell=True)

I got this error:

gzip: tsvfile.gz: No such file or directory
Command 'zgrep pbuf tsvfile' returned non-zero exit status 2
Traceback (most recent call last):
  File "ex.py", line 23, in filter_from_maintsvs
    print subprocess.check_output('zgrep pbuf tsvfile', shell=True)
  File "/datateam/tools/opt/lib/python2.7/subprocess.py", line 544, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
CalledProcessError: Command 'zgrep pbuf tsvfile' returned non-zero exit status 2`

How can use grep/zgrep within Python?

Upvotes: 1

Views: 5218

Answers (2)

Anand Satya
Anand Satya

Reputation: 81

I got the following solution after going through a blog and it worked for me :)

import subprocess
import signal

output = subprocess.check_output('grep string tsvfile', shell=True, preexec_fn=lambda: signal.signal(signal.SIGPIPE, signal.SIG_DFL))

print output  

Hints:

  • If the string was not found, grep ends with exit-code 1 and check_output will raise an exception.
  • check_output is available since Python 2.7. For an alternative look here.

Upvotes: 3

Martin Konecny
Martin Konecny

Reputation: 59611

Some comments on your code:

At the moment you've hardcoded the string and filename you're looking for to 'string' and 'tsvfile'. Try this instead:

subprocess.check_output(['grep', string, tsvfile])

Next, if you're using zgrep then you don't need to open your files with gzip.open. You can call zgrep on a tsv.gz file, and it will take care of opening it without any extra work from you. So instead try calling

subprocess.check_output(['zgrep', string, tsvfile]) 

Note that zgrep will also work on uncompressed tsv files, so you don't need to keep switching between grep and zgrep.

Upvotes: 2

Related Questions