Reputation: 87
with open('output.txt', 'w') as output:
for file in glob.glob('*.txt'):
with open(file, 'r', encoding="latin-1") as f:
for line in f:
pattern = r"justwhatever"
find = re.findall(pattern, line)
try:
output.write('\n'.join(find[0:])+'\n')
except UnicodeEncodeError:
pass
This has kept me up all night, I'm trying to search through big text files, my code crashes due to poor memory, I tried going through it line by line as you see in my code above, but I just can't seem to print each result on a separate line.
I'm able to either write add an "\n" to each searched line, which leaves me with many blank lines, or have all the result stacked together and only get separate lines once there's more than just one result on the same line ...
How would I go about searching my files line by line and outputing only the search match when it occurs on a line of it's own ?
Upvotes: 0
Views: 1271
Reputation: 1196
I couldn't get to test it on your files but I tried it on this file. This should work.
import glob, re
with open('output.txt', 'a') as output:
for file in glob.glob('alice.txt'):
with open(file, 'r') as f:
for line in f.readlines():
pattern = r"Alice"
find = re.findall(pattern, line)
if find:
try:
output.write(' '.join(find[0:])+'\n')
except UnicodeEncodeError:
pass
Upvotes: 1
Reputation: 87
I've found this:
for line in f:
pattern = r"yourpattern"
find = re.search(pattern, line)
if find:
try:
output.write(find.group() + '\n')
except UnicodeEncodeError:
pass
which works, but very slowly, hope someone got a better code for this to parse faster
Upvotes: 0