Reputation: 145
f = open('C:/Users/Sikander/Desktop/bradpitt.html')
for line in f.readlines():
p = line.partition('<a href="http://')
url = p[2].partition('">')
l = p[1] + url[0] + url[1]
print(l)
line = p[2]
This is my code, this code runs one time and stops.But I want my code to run again until line == ' '. How can I do it? I'm printing links from a webpage. My code gets the first link and doesn't get the other links. If a page has 4 links it will only print first link and stops and other 3 links are not printed.
What should I do?
Upvotes: 0
Views: 55
Reputation: 397
Since I can't comment yet...
I am going to add on to @wtpoo. I think this is the case, you won't always get a return carriage in an html document. So readline()
is working as intended.
The only addition I would add is to account for https://
Upvotes: 0
Reputation: 613
This is because the html page is just one huge line. May be you can loop through it with something like:
f = open('C:/Users/Sikander/Desktop/bradpitt.html')
text = f.read()
while('<a href="http://' in text):
#process it
text = text[text.index('<a href="http://')+16:]
I would suggest you to use beautiful soup module to collect all the links in the webpage.
Upvotes: 3
Reputation: 4572
You forgot to indent the subsequent lines that should be in the loop.
It might be easier to see why your code didn't do what you expected in a script rather than in console. Sometimes the presence of the >>>
can obscure indentation
f = open('C:/Users/Sikander/Desktop/bradpitt.html')
for line in f.readlines():
p = line.partition('<a href="http://')
url = p[2].partition('">')
l = p[1] + url[0] + url[1]
print(l)
line = p[2]
I'm guessing what you wanted is something like this:
f = open('C:/Users/Sikander/Desktop/bradpitt.html')
for line in f.readlines():
head, sep, tail = line.partition('<a href="http://')
urlhead, urlsep, urltail = tail.partition('">')
l = sep + head + urltail
print(l)
Upvotes: 0