David
David

Reputation: 941

Grep and Python

I need a way of searching a file using grep via a regular expression from the Unix command line. For example when I type in the command line:

python pythonfile.py 'RE' 'file-to-be-searched'

I need the regular expression 'RE' to be searched in the file and print out the matching lines.

Here's the code I have:

import re
import sys

search_term = sys.argv[1]
f = sys.argv[2]

for line in open(f, 'r'):
    if re.search(search_term, line):
        print line,
        if line == None:
            print 'no matches found'

But when I enter a word which isn't present, no matches found doesn't print

Upvotes: 94

Views: 390666

Answers (8)

Nick Fortescue
Nick Fortescue

Reputation: 44173

The natural question is why not just use grep?! But assuming you can't...

import re
import sys

file = open(sys.argv[2], "r")

for line in file:
     if re.search(sys.argv[1], line):
         print(line, end='\n')

Things to note:

  • search instead of match to find anywhere in string
  • comma (,) after print removes carriage return (line will have one)
  • argv includes python file name, so variables need to start at 1

This doesn't handle multiple arguments (like grep does) or expand wildcards (like the Unix shell would). If you wanted this functionality you could get it using the following:

#!/usr/bin/env python3

import re
import sys
import glob

regexp = re.compile(sys.argv[1])
for arg in sys.argv[2:]:
    for fn in glob.iglob(arg):
        with open(fn) as file:
            for line in file:
                if re.search(regexp, line):
                    print(line, end='')

Upvotes: 100

brunocrt
brunocrt

Reputation: 780

Not sure if your question was clear to me but to fix your code just change your if expression like the following:

import re
import sys

search_term = sys.argv[1]
f = sys.argv[2]
r = None
n = 0
with open(f, 'r') as file:
    for line in file:
        n=n+1
        r = re.search(search_term, line)
        if r:
            print(f"{line} found at line {n}")
if not r:
    print('no matches found')

PS: I tested it on Python 3.8.10

if you want to use grep you could

grep -E '(.*)word(.*)' file.txt || echo "pattern not found"

Upvotes: 1

Giancarlo Sportelli
Giancarlo Sportelli

Reputation: 1297

Concise and memory efficient:

#!/usr/bin/env python
# file: grep.py
import re, sys, collections

collections.deque(map(sys.stdout.write,(l for l in sys.stdin if re.search(sys.argv[1],l))),maxlen=0)

It works like egrep (without too much error handling), e.g.:

cat input-file | grep.py "RE"

And here is the one-liner:

cat input-file | python -c "import re,sys,collections;collections.deque(map(sys.stdout.write,(l for l in sys.stdin if re.search(sys.argv[1],l))),maxlen=0)" "RE"

Note that the collections.deque function is required in Python3 because map has become a lazy function.

Upvotes: 13

Eric
Eric

Reputation: 5101

You can use python-textops3 :

from textops import *

print('\n'.join(cat(f) | grep(search_term)))

with python-textops3 you can use unix-like commands with pipes

Upvotes: 4

richard
richard

Reputation: 21

The real problem is that the variable line always has a value. The test for "no matches found" is whether there is a match so the code "if line == None:" should be replaced with "else:"

Upvotes: 2

miku
miku

Reputation: 188084

Adapted from a grep in python.

Accepts a list of filenames via [2:], does no exception handling:

#!/usr/bin/env python
import re, sys, os

for f in filter(os.path.isfile, sys.argv[2:]):
    for line in open(f).readlines():
        if re.match(sys.argv[1], line):
            print line

sys.argv[1] resp sys.argv[2:] works, if you run it as an standalone executable, meaning

chmod +x

first

Upvotes: 9

Piotr Dobrogost
Piotr Dobrogost

Reputation: 42425

You might be interested in pyp. Citing my other answer:

"The Pyed Piper", or pyp, is a linux command line text manipulation tool similar to awk or sed, but which uses standard python string and list methods as well as custom functions evolved to generate fast results in an intense production environment.

Upvotes: 4

jldupont
jldupont

Reputation: 96746

  1. use sys.argv to get the command-line parameters
  2. use open(), read() to manipulate file
  3. use the Python re module to match lines

Upvotes: 5

Related Questions