rom
rom

Reputation: 111

How to convert "Short Links" from an "href" to an Actual URL?

Let's say I'm crawling a webpage, and I scrape all of the links off of it. In python, how can I convert links like these:

Catalog.php
Products.aspx
Contact.html

to actual links like these:

https://example.com/Catalog.php
https://example.com/Products.aspx
https://example.com/Contact.html

I searched everywhere on stack overflow using the power of DuckDuckGo. Maybe there's a duplicate of this question, but I have no idea of how to phrase the question.

Upvotes: 0

Views: 570

Answers (2)

HelloWorld
HelloWorld

Reputation: 307

Let's say you have the https://example.com as base path.

You can use the urljoin method from urllib.

Construct a full (“absolute”) URL by combining a “base URL” (base) with another URL (url). Informally, this uses components of the base URL, in particular the addressing scheme, the network location and (part of) the path, to provide missing components in the relative URL.

import urllib.parse

base_path = "https://example.com/"

relative_path = "/Catalog.php"
new_url = urllib.parse.urljoin(base_path,relative_path)

You get

>>> https://example.com/Catalog.php

Upvotes: 1

ImGeorge
ImGeorge

Reputation: 527

import urllib.parse
urllib.parse.urljoin("https://example.com", "/Catalog.php")

Upvotes: 0

Related Questions