Reputation: 27575
I am parsing a series of text files for some patterns, since I want to extract them to other file.
A way to say it is that I would like to "remove" everything except the matches from the file.
For example, if I have pattern1, pattern2, pattern3 as matching patterns, I'd like the following input:
bla bla
pattern1
pattern2
bla bla bla
pattern1
pattern3
bla bla bla
pattern1
To give the following output:
pattern1
pattern2
pattern1
pattern3
pattern1
I can use re.findall
and successfully get the list of matches for any pattern, but I cannot think of a way to KEEP THE ORDER considering the matches of each pattern are mixed inside the file.
Thanks for reading.
Upvotes: 1
Views: 218
Reputation: 43437
here is an answer in "copy this and go" format.
import re
#lets you add more whenever you want
list_of_regex = [r"aaaa",r"bbbb",r"cccc"]
#hold the completed pattern
pattern_string = r"^("
#combines the patterns
for item in list_of_regex:
pattern_string += "|".join(list_of_regex)
pattern_string += r")"
#open the file that you are reading
fr = open(FILE_TO_READ)
#holds the read files strings
search_string = fr.read()
#close the file
fr.close()
#open the file you want to write to
fw = open(FILE_TO_WRITE, 'w')
#write the results of findall into the file (as requested)
fw.writelines(re.findall(pattern_string,search_string))
#close the file
fw.close()
Upvotes: 2
Reputation: 30618
Combine it all into a single pattern. With your example code, use the pattern:
^pattern[0-9]+
If it's actually more complex, then try
^(aaaaa|bbbbb|ccccc|ddddd)
Upvotes: 5