Julien Chien
Julien Chien

Reputation: 2190

Running grep through Python - doesn't work

I have some code like this:

f = open("words.txt", "w")
subprocess.call(["grep", p, "/usr/share/dict/words"], stdout=f)
f.close()

I want to grep the MacOs dictionary for a certain pattern and write the results to words.txt. For example, if I want to do something like grep '\<a.\>' /usr/share/dict/words, I'd run the above code with p = "'\<a.\>'". However, the subprocess call doesn't seem to work properly and words.txt remains empty. Any thoughts on why that is? Also, is there a way to apply regex to /usr/share/dict/words without calling a grep-subprocess?

edit: When I run grep '\<a.\>' /usr/share/dict/words in my terminal, I get words like: aa ad ae ah ai ak al am an ar as at aw ax ay as results in the terminal (or a file if I redirect them there). This is what I expect words.txt to have after I run the subprocess call.

Upvotes: 0

Views: 1690

Answers (2)

tripleee
tripleee

Reputation: 189427

Like @woockashek already commented, you are not getting any results because there are no hits on '\<a.\>' in your input file. You are probably actually hoping to find hits for \<a.\> but then obviously you need to omit the single quotes, which are messing you up.

Of course, Python knows full well how to look for a regex in a file.

import re

rx = re.compile(r'\ba.\b')
with open('/usr/share/dict/words', 'Ur') as reader, open('words.txt', 'w') as writer:
    for line in reader:
        if rx.search(line):
            print(line, file=writer, end='')

The single quotes here are part of Python's string syntax, just like the single quotes on the command line are part of the shell's syntax. In neither case are they part of the actual regular expression you are searching for.

The subprocess.Popen documentation vaguely alludes to the frequently overlooked fact that the shell's quoting is not necessary or useful when you don't have shell=True (which usually you should avoid anyway, for this and other reasons).

Python unfortunately doesn't support \< and \> as word boundary operators, so we have to use (the functionally equivalent) \b instead.

Upvotes: 2

Jose F. Gomez
Jose F. Gomez

Reputation: 186

The standard input and output channels for the process started by call() are bound to the parent’s input and output. That means the calling programm cannot capture the output of the command. Use check_output() to capture the output for later processing:

import subprocess
f = open("words.txt", "w")
output = subprocess.check_output(['grep', p ,'-1'])
file.write(output)
print output
f.close()

PD: I hope it works, i cant check the answer because i have not MacOS to try it.

Upvotes: -1

Related Questions