Reputation: 267
I'm trying to iterate through an html file from the internet.
target = br.response().read()
for row in target:
if "[some text]" in row:
print next(target)
The problem is this loop iterates over each character in the html file, so it'll never find a match. How do I get it to iterate through each row instead?
I've tried target = target.splitlines()
, but that really messes up the file.
Upvotes: 1
Views: 4629
Reputation: 1
Take a look at the page source for the file you're viewing, because that's what you're getting back as a response. I have a feeling the response you're getting doesn't actually have new lines where you want it to. For pages like http://docs.python.org/ where the source is readable your splitline() method works great, but for sites where the source essentially has no line breaks, like Google's homepage, it's a lot closer to the problems you're experiencing.
Depending on what you are trying to achieve, your best bet might be to use an html/xml parsing library like lxml. Otherwise using re is probably a pretty safe approach. Both are a lot better than trying to guess where the line breaks should be.
Upvotes: 0
Reputation: 5676
What you basically want to achieve is the following (reading from a file, as your header suggests):
#!/usr/bin/env python
import sys
with open("test.txt") as file:
for line in file:
if "got" in line:
print "found: {0}".format(line)
You want to open your file ("test.txt")
.
You read each line (for .. in
)
and look if the line contains a string, where in
comes in nice:)
If you are interested in the line number:
for index, line in enumerate(file):
But beware the index starts with 0, so the current line number is index+1
Analog, if you want to read from a String as a file, take a look at StringIO.
Upvotes: 3