T.Galisnky
T.Galisnky

Reputation: 87

How do I write a regex match to a file each on a line using python?

with open('output.txt', 'w') as output:
    for file in glob.glob('*.txt'):
        with open(file, 'r', encoding="latin-1") as f:
            for line in f:
                pattern = r"justwhatever"
                find = re.findall(pattern, line)
                try:
                    output.write('\n'.join(find[0:])+'\n')
                except UnicodeEncodeError:
                    pass

This has kept me up all night, I'm trying to search through big text files, my code crashes due to poor memory, I tried going through it line by line as you see in my code above, but I just can't seem to print each result on a separate line.

I'm able to either write add an "\n" to each searched line, which leaves me with many blank lines, or have all the result stacked together and only get separate lines once there's more than just one result on the same line ...

How would I go about searching my files line by line and outputing only the search match when it occurs on a line of it's own ?

Upvotes: 0

Views: 1271

Answers (2)

Jaysheel Utekar
Jaysheel Utekar

Reputation: 1196

I couldn't get to test it on your files but I tried it on this file. This should work.

import glob, re
with open('output.txt', 'a') as output:
    for file in glob.glob('alice.txt'):
        with open(file, 'r') as f:
            for line in f.readlines():
                pattern = r"Alice"
                find = re.findall(pattern, line)
                if find:
                    try:
                        output.write(' '.join(find[0:])+'\n')
                    except UnicodeEncodeError:
                        pass

Upvotes: 1

T.Galisnky
T.Galisnky

Reputation: 87

I've found this:

for line in f:
                pattern = r"yourpattern"
                find = re.search(pattern, line)
                if find:
                    try:
                        output.write(find.group() + '\n')
                    except UnicodeEncodeError:
                        pass

which works, but very slowly, hope someone got a better code for this to parse faster

Upvotes: 0

Related Questions