keep contents between one pattern to another pattern

Question

I want to parse html contents and keep the contents from A to B ex:

some content1...

some content2

some content3

will become


some content2

Now, I use sed to do:

sed '/begin_here/,/end_here/!d' file.html > file2.html

However, I'd like to rewrite it using python for cross-platform purpose. I am not very familiar to regex in python. Could give me some hints to do this? Thanks a lot :)

jadkik94 · Accepted Answer

You can do it without regular expressions, like so:

add_next = False # Do not add lines
# Until you encounter the first "start_here", which sets it to True
with open("file1.html", "r") as in_file:
    with open("file2.html", "w") as out_file:
        for line in in_file:
            if "end_here" in line: # or line.startswith("end_here") for example
                add_next = False
            if add_next:
                out_file.write(line)
            if "begin_here" in line:
                add_next = True

keep contents between one pattern to another pattern

Answers (2)

Related Questions