Reputation: 29
I am aware that there are some quite similar posts about this on the forum but I need this for a quick scan of a text file. I have to run 500 checks through a 1 GB file and print out lines that contain certain phrases, here is my code:
import re
with open('text.txt', 'r') as f:
searchstrings = ('aaAa','bBbb')
for line in f.readlines():
for word in searchstrings:
word2 = ".*" + word + ".*"
match = re.search(word2, line)
if match:
print word + " " + line
I was trying to make it return any line containing those phrases, so even if the line was "BBjahdAAAAmm" I wanted it returned because it has aaaa in it. aaAa and bBbb are just examples, the list is completely different.
Upvotes: 1
Views: 60
Reputation: 4875
Don't use f.readlines()
You'll be loading the whole 1GB into memory. Read them one at a time.
Instead do:
searchstrings = ('aaAa','bBbb')
with open('text.txt', 'r') as f:
for line in f:
for word in searchstrings:
if word.lower() in line.lower():
print word + " " + line
Upvotes: 2