daOnlyBG
daOnlyBG

Reputation: 601

How to read and write HTML to a file delimited by \n spaces?

I have an HTML file with \n spaces separating each element tag. We'll call this HTML file results_cache.html. I'd like to read results_cache.html with Python and then write its contents into another file, hopeful.html.

However, when writing the contents, I'd like to start a new line in hopeful.html each time a \n pops up. I was under the impression that Python would naturally do this; unfortunately, the entire HTML prints on one line only.

Here is my code:

lines = [str(line.rstrip('\n')) for line in open('results_cache.html')]

final_cache = open('hopeful.html','w')
for line in lines:
    final_cache.write(str(line))

final_cache.close()

This is a snapshot of what hopeful.html looks like:

'<table>\n <!-- ngRepeat: attempt in vm.getdate() --> <tr ng-repeat="attemp...

...with nothing else below it.

One thing I would like to point out is that the entire line is wrapped with single quotes. I don't know if this effects the outcome or not.

The HTML was scraped off a website using Selenium Webdriver.

Upvotes: 0

Views: 48

Answers (1)

Jay Atkinson
Jay Atkinson

Reputation: 3287

Your for loop around the "open('results_cache.html')" is not iterating a line at a time, but it is iterating a character at a time.

with open('results_cache.html') as readfile:
    htmlfile = readfile.readlines()

lines = [line.rstrip('\n') for line in htmlfile]

Or you could do it down and dirty:

lines = [line.rstrip('\n') for line in open('results_cache.html').readlines()]

But using the "with" statement is better for proper cleanup should exceptions happen when using file operations.

Upvotes: 2

Related Questions