Reputation: 69
How can I get I an absolute URL from an absolute URL and a relative URL? The relative URL comes from the href
of a link.
This what I tried:
import urllib
import urllib.request
import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup
mainurl = "http://www.bestbuy.ca"
theurl = "http://www.bestbuy.ca/en-CA/category/top-freezer- refrigerators/34734.aspx?type=product&page=1&pageSize=96"
thepage = urllib.request.urlopen(theurl)
soup = BeautifulSoup(thepage, "html.parser")
producturl = soup.find('h4',{"class":"prod-title"}).find('a')
print (producturl)
fullurl = (mainurl,producturl)
print(fullurl)
Upvotes: 0
Views: 295
Reputation: 473873
As @keiv.fly already posted, you need to get the href
attribute value of a link. Then, instead of regular string concatenation, use .urljoin()
to combine the base url with the relative URL of the link to produce an absolute URL.
I would also improve the way you are locating the link:
from urllib.parse import urljoin
product_url = soup.select_one('h4.prod-title a')["href"]
product_url = urljoin(mainurl, product_url)
Upvotes: 1
Reputation: 4015
You should use ['href'] on beautifulsoup object to get the link as a string. Then just concatanate.
fullurl = mainurl + soup.find('h4',{"class":"prod-title"}).find('a')['href']
or
fullurl = mainurl + producturl['href']
Upvotes: 0