heltonbiker
heltonbiker

Reputation: 27575

Find regex occurences of a set of patterns in correct order with Python

I am parsing a series of text files for some patterns, since I want to extract them to other file.

A way to say it is that I would like to "remove" everything except the matches from the file.

For example, if I have pattern1, pattern2, pattern3 as matching patterns, I'd like the following input:

bla bla
pattern1
pattern2
bla bla bla
pattern1
pattern3
bla bla bla
pattern1

To give the following output:

pattern1
pattern2
pattern1
pattern3
pattern1

I can use re.findall and successfully get the list of matches for any pattern, but I cannot think of a way to KEEP THE ORDER considering the matches of each pattern are mixed inside the file.

Thanks for reading.

Upvotes: 1

Views: 218

Answers (2)

Inbar Rose
Inbar Rose

Reputation: 43437

here is an answer in "copy this and go" format.

import re

#lets you add more whenever you want
list_of_regex = [r"aaaa",r"bbbb",r"cccc"]

#hold the completed pattern
pattern_string = r"^("

#combines the patterns
for item in list_of_regex:
    pattern_string += "|".join(list_of_regex)

pattern_string += r")"

#open the file that you are reading
fr = open(FILE_TO_READ)

#holds the read files strings
search_string = fr.read()

#close the file
fr.close()

#open the file you want to write to
fw = open(FILE_TO_WRITE, 'w')

#write the results of findall into the file (as requested)
fw.writelines(re.findall(pattern_string,search_string))

#close the file
fw.close()

Upvotes: 2

Richard
Richard

Reputation: 30618

Combine it all into a single pattern. With your example code, use the pattern:

^pattern[0-9]+

If it's actually more complex, then try

^(aaaaa|bbbbb|ccccc|ddddd)

Upvotes: 5

Related Questions