Reputation: 163
I'm trying to understand the necessity to delete everything from the array in the last string.
The task is: Find the link at position 18 (the first name is 1). Follow that link. Repeat this process 7 times. The answer is the last name that you retrieve.
#Position / count - 3 variant
import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl
# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
taglist=list()
url=input("Enter URL: ")
count=int(input("Enter count:"))
position=int(input("Enter position:"))
for i in range(count):
html = urllib.request.urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')
tags=soup('a')
for tag in tags:
taglist.append(tag)
url = taglist[position-1].get('href', None)
del taglist[:]
print ("Retrieving:",url)
Upvotes: 0
Views: 54
Reputation: 2036
Although that isn't the way I would do it, this is so you start with a new taglist
every time. In this line:
for tag in tags:
taglist.append(tag)
you append to the taglist
. If you delete the content of the list, you will start fresh each iteration of the outer for loop.
The function would act differently when you index into the taglist
if you had all the tags in there from the previous iterations. The key lines to look at for this are:
position=int(input("Enter position:"))
and
url = taglist[position-1].get('href', None)
If you didn't reset the taglist
, position-1
would correspond to a different element.
I'm not sure I would say what you did is wrong, but without actually knowing about the site you are using this for, I would be inclined to use a list comprehension. The second way seems more Pythonic to me, and I also think it's more efficient.
# Instead of this
tags=soup('a')
for tag in tags:
taglist.append(tag)
url = taglist[position-1].get('href', None)
del taglist[:]
# I would use this:
taglist = [tag for tag in soup('a')]
url = taglist[position-1].get('href', None)
Upvotes: 1