Caesius
Caesius

Reputation: 267

Python: Iterate through an html file

I'm trying to iterate through an html file from the internet.

target = br.response().read()
for row in target:
    if "[some text]" in row:
    print next(target)

The problem is this loop iterates over each character in the html file, so it'll never find a match. How do I get it to iterate through each row instead?

I've tried target = target.splitlines() , but that really messes up the file.

Upvotes: 1

Views: 4629

Answers (2)

HairdryerOfRassilon
HairdryerOfRassilon

Reputation: 1

Take a look at the page source for the file you're viewing, because that's what you're getting back as a response. I have a feeling the response you're getting doesn't actually have new lines where you want it to. For pages like http://docs.python.org/ where the source is readable your splitline() method works great, but for sites where the source essentially has no line breaks, like Google's homepage, it's a lot closer to the problems you're experiencing.

Depending on what you are trying to achieve, your best bet might be to use an html/xml parsing library like lxml. Otherwise using re is probably a pretty safe approach. Both are a lot better than trying to guess where the line breaks should be.

Upvotes: 0

Thomas Junk
Thomas Junk

Reputation: 5676

What you basically want to achieve is the following (reading from a file, as your header suggests):

#!/usr/bin/env python

import sys

with open("test.txt") as file:
    for line in file:
        if "got" in line:
            print "found: {0}".format(line)

You want to open your file ("test.txt").

You read each line (for .. in)

and look if the line contains a string, where in comes in nice:)

If you are interested in the line number:

    for index, line in enumerate(file):

But beware the index starts with 0, so the current line number is index+1

Analog, if you want to read from a String as a file, take a look at StringIO.

Upvotes: 3

Related Questions