Reputation: 888
So in the book Automate the Boring Stuff with Python, there is this homework project:
Write a program that opens all .txt
files in a folder and searches for any line that matches a user-supplied regular expression. The results should be printed to the screen.
Below is my code. I have two questions:
import os, re
# Dir Location
print('Enter a directory location: (in which txt files are located)')
direct= input()
os.chdir(direct)
# Regexes
print("Enter the text you'd like to search for: (or a regex)")
givenReg= input()
soloReg= re.compile(givenReg)
lineReg= re.compile((r'^\n.*')+givenReg+(r'.*\n$'))
txtFileReg= re.compile(r'.*\.txt')
# Texts in Dir
txtFiles= os.listdir(direct)
# Finding line through Regex
for i in range(len(txtFiles)):
if txtFileReg.search(txtFiles[i]) != None:
file= open(txtFiles[i])
read= file.read()
outcomeSolo= soloReg.findall(read)
outcomeLine= lineReg.findall(read)
print('In ' + txtFiles[i] + ', found these matches:')
print(outcomeLine)
print('In ' + txtFiles[i] + ', the lines for these matches were:')
print(outcomeSolo)
print('\n')
file.close()
Upvotes: 1
Views: 2139
Reputation: 42421
One way to make the program shorter is to make it behave more like a typical command-line program: take inputs as arguments, rather that via some type of dialogue.
Another way is to make the output less chatty. Take at look at how grep
works for one example.
You can also take advantage of things like glob()
.
Rather than reading the entire file into memory, just iterate over the file line by line (this has many advantages in programs like this).
Finally, it's not clear to me why you are wrapping the user's regular expression in your own leading and trailing patterns: just let the user fully control the regex (at least, that's what I'd do).
Here's a short illustration of these points:
import sys, glob, re
dir_path = sys.argv[1]
rgx = re.compile(sys.argv[2])
for path in glob.glob(dir_path + '/*.txt'):
with open(path) as fh:
for line in fh:
if rgx.search(line):
msg = '{}:{}'.format(path, line)
print(msg, end = '')
Upvotes: 1