Reputation: 93
I have an html file where I want to remove all lines above the line starting with string <!DOCTYPE html
Example:
HTTP/1.1 400 Bad Request
Content-Type: text/html; charset=utf-8
Date: Sat, 22 Mar 2015 07:56:52
Connection: close
Content-Length: 2959
<!DOCTYPE html...... extra lines ...
So when I search for the occurrence of string <!DOCTYPE
I need to remove all lines including blank ones above this particular line. In linux we have an option in grep
which can search for the lines above and below and then delete it. Can we do a similar thing in Python?
Upvotes: 0
Views: 1992
Reputation: 15
Not sure what you mean exactly, but I think what you mean is your opening an HTML file and then trying to edit what's inside? This may be unorthodox, but try opening it to just read, use readlines() to get and store all of the lines. Filter out the line that you don't want. Then close the file, open it again for writing and just paste your lines inside (this will overwrite all of the current contents in the file). This allows you to remove lines that you don't want within the middle as well.
Upvotes: 0
Reputation: 114025
stop = "<!DOCTYPE html"
with open('input.html') as infile, open('output.html', 'w') as outfile:
buff = []
for line in infile:
if not line.strip():
buff.append(line)
continue
if line.strip() == stop: break
outfile.write(''.join(buff))
buff = []
outfile.write(line)
Upvotes: 1