Reputation: 2957
I want to parse html contents and keep the contents from A to B ex:
some content1...
<!-- begin_here -->
some content2
<!-- end_here -->
some content3
will become
<!-- begin_here -->
some content2
<!-- end_here -->
Now, I use sed to do:
sed '/begin_here/,/end_here/!d' file.html > file2.html
However, I'd like to rewrite it using python for cross-platform purpose. I am not very familiar to regex in python. Could give me some hints to do this? Thanks a lot :)
Upvotes: 0
Views: 85
Reputation: 23364
Use multiline regex
import re
pat = re.compile('''^<!-- begin_here -->.*?<!-- end_here -->$''',
re.DOTALL + re.MULTILINE)
with open("file.txt") as f:
print pat.findall(f.read())
Upvotes: 2
Reputation: 7068
You can do it without regular expressions, like so:
add_next = False # Do not add lines
# Until you encounter the first "start_here", which sets it to True
with open("file1.html", "r") as in_file:
with open("file2.html", "w") as out_file:
for line in in_file:
if "end_here" in line: # or line.startswith("end_here") for example
add_next = False
if add_next:
out_file.write(line)
if "begin_here" in line:
add_next = True
Upvotes: 2