Ada Stra
Ada Stra

Reputation: 1581

Removing HTML until a certain point in a file

The problem is pretty simple:
I have a few thousand HTML files that I'd like to loop through and remove everything until the second instance of this:

<!--------------------------------------------------------->

I know how to load files, write loops, etc., in Python, but all my attempts to parse the files as text are failing.

Upvotes: 0

Views: 23

Answers (1)

Aaron
Aaron

Reputation: 2393

Your can try split the string, and get the stuff after the second occurrence.

source = "YOUR HTML FILE CONTENT"
print source.split('<!--------------------------------------------------------->')[2:]

Upvotes: 1

Related Questions