Reputation: 1487
I have this script:
import urllib.request
from bs4 import BeautifulSoup
url= 'https://www.inforge.net/xi/forums/liste-proxy.1118/'
soup = BeautifulSoup(urllib.request.urlopen(url), "lxml")
base = ("https://www.inforge.net/xi/")
for tag in soup.find_all('a', {'class':'PreviewTooltip'}):
links = (tag.get('href'))
final = base + links
print (final[0])
which takes every link of the topics in this page.
The problem is that when I print(final[0])
the output is:
h
instead of the entire link. Can someone help me with this?
Upvotes: 2
Views: 60
Reputation: 160617
final
has a type of str
, as such, indexing it in position 0
will result in the first character of the url
getting printed, specifically h
.
You either need to print all of final
if you're using it as a str
:
print(final)
or, if you must have a list
, make final
a list
in the for
loop by enclosing it in square brackets []
:
final = [base + links]
then print(final[0])
will print the first element of the list
as you'd expect.
As @Bryan pointed out and I just noticed, it seems like you might be confused about the usage of ()
in Python. Without a comma ,
inside the ()
they do absolutely nothing. If you add the comma, it turns them into tuples
(not lists
, lists use square brackets []
).
So:
base = ("https://www.inforge.net/xi/")
results in base
referring to a value of str
type while:
base = ("https://www.inforge.net/xi/", )
# which can also be written as:
base = "https://www.inforge.net/xi/",
results in base
referring to a value of tuple
type with a single element.
The same applies for the name links
:
links = (tag.get('href')) # 'str'
links = (tag.get('href'), ) # 'tuple'
If you change links
and base
to be tuples then final
is going to end up as a 2 element tuple
after final = base + links
is executed. So, in this case you should join the elements inside the tuple during your print
call:
print ("".join(final)) # takes all elements in final and joins them together
Upvotes: 4