hman_codes
hman_codes

Reputation: 888

How to get the whole line in which a regex is matched?

So in the book Automate the Boring Stuff with Python, there is this homework project:

Write a program that opens all .txt files in a folder and searches for any line that matches a user-supplied regular expression. The results should be printed to the screen.

Below is my code. I have two questions:

  1. Is there a shorter version for the program?
  2. It seems that there is something wrong with the line regex (I want a regex to match the whole line in which the user-specified regex appeared) because it's not showing any results under lineReg. I tried removing the parentheses around the leading and trailing part of the regex.

import os, re

# Dir Location
print('Enter a directory location: (in which txt files are located)')
direct= input()
os.chdir(direct)

# Regexes
print("Enter the text you'd like to search for: (or a regex)")
givenReg= input()
soloReg= re.compile(givenReg)
lineReg= re.compile((r'^\n.*')+givenReg+(r'.*\n$'))
txtFileReg= re.compile(r'.*\.txt')

# Texts in Dir
txtFiles= os.listdir(direct)

# Finding line through Regex
for i in range(len(txtFiles)):
    if txtFileReg.search(txtFiles[i]) != None:
        file= open(txtFiles[i])
        read= file.read()

        outcomeSolo= soloReg.findall(read)
        outcomeLine= lineReg.findall(read)

        print('In ' + txtFiles[i] + ', found these matches:')
        print(outcomeLine)

        print('In ' + txtFiles[i] + ', the lines for these matches were:')
        print(outcomeSolo)

        print('\n')
        file.close()

Upvotes: 1

Views: 2139

Answers (1)

FMc
FMc

Reputation: 42421

One way to make the program shorter is to make it behave more like a typical command-line program: take inputs as arguments, rather that via some type of dialogue.

Another way is to make the output less chatty. Take at look at how grep works for one example.

You can also take advantage of things like glob().

Rather than reading the entire file into memory, just iterate over the file line by line (this has many advantages in programs like this).

Finally, it's not clear to me why you are wrapping the user's regular expression in your own leading and trailing patterns: just let the user fully control the regex (at least, that's what I'd do).

Here's a short illustration of these points:

import sys, glob, re

dir_path = sys.argv[1]
rgx = re.compile(sys.argv[2])

for path in glob.glob(dir_path + '/*.txt'):
    with open(path) as fh:
        for line in fh:
            if rgx.search(line):
                msg = '{}:{}'.format(path, line)
                print(msg, end = '')

Upvotes: 1

Related Questions