Gary Grey
Gary Grey

Reputation: 145

Code runs one time and stops, doesn't loop?

f = open('C:/Users/Sikander/Desktop/bradpitt.html')
for line in f.readlines():
    p = line.partition('<a href="http://')
    url = p[2].partition('">')
    l = p[1] + url[0] + url[1]
    print(l)
    line = p[2]

This is my code, this code runs one time and stops.But I want my code to run again until line == ' '. How can I do it? I'm printing links from a webpage. My code gets the first link and doesn't get the other links. If a page has 4 links it will only print first link and stops and other 3 links are not printed.

What should I do?

Upvotes: 0

Views: 55

Answers (3)

Chris Clark
Chris Clark

Reputation: 397

Since I can't comment yet...

I am going to add on to @wtpoo. I think this is the case, you won't always get a return carriage in an html document. So readline() is working as intended.

The only addition I would add is to account for https://

Upvotes: 0

schafle
schafle

Reputation: 613

This is because the html page is just one huge line. May be you can loop through it with something like:

f = open('C:/Users/Sikander/Desktop/bradpitt.html')
text = f.read()
while('<a href="http://' in text):
    #process it
    text = text[text.index('<a href="http://')+16:]

I would suggest you to use beautiful soup module to collect all the links in the webpage.

Upvotes: 3

Marcel Wilson
Marcel Wilson

Reputation: 4572

You forgot to indent the subsequent lines that should be in the loop.

It might be easier to see why your code didn't do what you expected in a script rather than in console. Sometimes the presence of the >>> can obscure indentation

f = open('C:/Users/Sikander/Desktop/bradpitt.html')
for line in f.readlines():
p = line.partition('<a href="http://')
url = p[2].partition('">')
l = p[1] + url[0] + url[1]
print(l)
line = p[2]

I'm guessing what you wanted is something like this:

f = open('C:/Users/Sikander/Desktop/bradpitt.html')
for line in f.readlines():
    head, sep, tail = line.partition('<a href="http://')
    urlhead, urlsep, urltail = tail.partition('">')
    l = sep + head + urltail
    print(l)

Upvotes: 0

Related Questions