soma
soma

Reputation: 93

How to remove all lines above a certain line in python

I have an html file where I want to remove all lines above the line starting with string <!DOCTYPE html

Example:

HTTP/1.1 400 Bad Request
Content-Type: text/html; charset=utf-8
Date: Sat, 22 Mar 2015 07:56:52 
Connection: close
Content-Length: 2959

<!DOCTYPE html...... extra lines ...

So when I search for the occurrence of string <!DOCTYPE I need to remove all lines including blank ones above this particular line. In linux we have an option in grep which can search for the lines above and below and then delete it. Can we do a similar thing in Python?

Upvotes: 0

Views: 1992

Answers (2)

RamiHafez
RamiHafez

Reputation: 15

Not sure what you mean exactly, but I think what you mean is your opening an HTML file and then trying to edit what's inside? This may be unorthodox, but try opening it to just read, use readlines() to get and store all of the lines. Filter out the line that you don't want. Then close the file, open it again for writing and just paste your lines inside (this will overwrite all of the current contents in the file). This allows you to remove lines that you don't want within the middle as well.

Upvotes: 0

inspectorG4dget
inspectorG4dget

Reputation: 114025

stop = "<!DOCTYPE html"

with open('input.html') as infile, open('output.html', 'w') as outfile:
    buff = []
    for line in infile:
        if not line.strip():
            buff.append(line)
            continue
        if line.strip() == stop: break
        outfile.write(''.join(buff))
        buff = []
        outfile.write(line)

Upvotes: 1

Related Questions