nobb666
nobb666

Reputation: 69

Trying to add two URLs together to get one URL

How can I get I an absolute URL from an absolute URL and a relative URL? The relative URL comes from the href of a link.

This what I tried:

import urllib
import urllib.request
import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup

mainurl = "http://www.bestbuy.ca"
theurl = "http://www.bestbuy.ca/en-CA/category/top-freezer-      refrigerators/34734.aspx?type=product&page=1&pageSize=96"
thepage = urllib.request.urlopen(theurl)
soup = BeautifulSoup(thepage, "html.parser")

producturl = soup.find('h4',{"class":"prod-title"}).find('a')

print (producturl)

fullurl = (mainurl,producturl)

print(fullurl)

Upvotes: 0

Views: 295

Answers (2)

alecxe
alecxe

Reputation: 473873

As @keiv.fly already posted, you need to get the href attribute value of a link. Then, instead of regular string concatenation, use .urljoin() to combine the base url with the relative URL of the link to produce an absolute URL.

I would also improve the way you are locating the link:

from urllib.parse import urljoin

product_url = soup.select_one('h4.prod-title a')["href"]
product_url = urljoin(mainurl, product_url)

Upvotes: 1

keiv.fly
keiv.fly

Reputation: 4015

You should use ['href'] on beautifulsoup object to get the link as a string. Then just concatanate.

fullurl = mainurl + soup.find('h4',{"class":"prod-title"}).find('a')['href']

or

fullurl = mainurl + producturl['href']

Upvotes: 0

Related Questions