Reputation: 111
Let's say I'm crawling a webpage, and I scrape all of the links off of it. In python, how can I convert links like these:
Catalog.php
Products.aspx
Contact.html
to actual links like these:
https://example.com/Catalog.php
https://example.com/Products.aspx
https://example.com/Contact.html
I searched everywhere on stack overflow using the power of DuckDuckGo. Maybe there's a duplicate of this question, but I have no idea of how to phrase the question.
Upvotes: 0
Views: 570
Reputation: 307
Let's say you have the https://example.com as base path.
You can use the urljoin method from urllib.
Construct a full (“absolute”) URL by combining a “base URL” (base) with another URL (url). Informally, this uses components of the base URL, in particular the addressing scheme, the network location and (part of) the path, to provide missing components in the relative URL.
import urllib.parse
base_path = "https://example.com/"
relative_path = "/Catalog.php"
new_url = urllib.parse.urljoin(base_path,relative_path)
You get
>>> https://example.com/Catalog.php
Upvotes: 1
Reputation: 527
import urllib.parse
urllib.parse.urljoin("https://example.com", "/Catalog.php")
Upvotes: 0