Daria
Daria

Reputation: 163

Removing everything from the taglist

I'm trying to understand the necessity to delete everything from the array in the last string.

The task is: Find the link at position 18 (the first name is 1). Follow that link. Repeat this process 7 times. The answer is the last name that you retrieve.

    #Position / count - 3 variant
import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl
# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

taglist=list()
url=input("Enter URL: ")
count=int(input("Enter count:"))
position=int(input("Enter position:"))
for i in range(count):
    html = urllib.request.urlopen(url, context=ctx).read()
    soup = BeautifulSoup(html, 'html.parser')
    tags=soup('a')
    for tag in tags:
        taglist.append(tag)
    url = taglist[position-1].get('href', None)
    del taglist[:]
print ("Retrieving:",url)

Upvotes: 0

Views: 54

Answers (1)

Stephen C
Stephen C

Reputation: 2036

Although that isn't the way I would do it, this is so you start with a new taglist every time. In this line:

for tag in tags:
    taglist.append(tag)

you append to the taglist. If you delete the content of the list, you will start fresh each iteration of the outer for loop.

The function would act differently when you index into the taglist if you had all the tags in there from the previous iterations. The key lines to look at for this are:

position=int(input("Enter position:"))

and

url = taglist[position-1].get('href', None)

If you didn't reset the taglist, position-1 would correspond to a different element.


I'm not sure I would say what you did is wrong, but without actually knowing about the site you are using this for, I would be inclined to use a list comprehension. The second way seems more Pythonic to me, and I also think it's more efficient.

# Instead of this
tags=soup('a')
for tag in tags:
    taglist.append(tag)
url = taglist[position-1].get('href', None)
del taglist[:]

# I would use this:
taglist = [tag for tag in soup('a')]
url = taglist[position-1].get('href', None)

Upvotes: 1

Related Questions