Reputation: 23
I have an HTML file and I need to remove all line breaks between the body tag
<HTML>
<HEAD>
<TITLE>
</TITLE>
</HEAD>
<BODY>
<P></P>
<P></P>
</BODY>
</HTML>
to get it
<HTML>
<HEAD>
<TITLE>
</TITLE>
</HEAD>
<BODY><P></P><P></P></BODY>
</HTML>
Upvotes: 1
Views: 1010
Reputation: 86
file_content = open('name.html', 'r').read()
start_index, end_index = file_content.index("<BODY>"), file_content.index("</BODY>")
head , body_content, tail = file_content[:start_index], file_content[start_index:end_index], file_content[end_index:]
new_html = head + body_content.replace("\n", "") + tail
file_content = open('name.html', 'w')
file_content.write(new_html)
Upvotes: 0
Reputation: 127
This is a little homemade and uses no external libraries: (suppose your file is foo.html
)
with open('foo.html') as f:
html_file = f.readlines()
body_index = []
for line in html_file :
if 'BODY' in line :
body_index.append(html_file.index(line))
start, end = body_index
start += 1
for i in range(start, end) :
if '\n' in html_file[i] :
html_file[i] = html_file[i].replace('\n', '')
done
Upvotes: 0
Reputation: 5190
Try to get the whole html into a string and do this.
bodystring = htmlstring[htmlstring.index('<BODY>'):htmlstring.index('</BODY>')+7]
htmlstring = htmlstring.replace(bodystring, bodystring.replace('\n',''))
Upvotes: 1