Akira
Akira

Reputation: 2870

How to extract absolute URL of href with relative path?

I am trying to extract download links from this link.

Here is the page source (viewing in Google Chrome) of that link:

enter image description here

When I point at ../matlab/licensing.pdf on the page source, a link https://www.mathworks.com/help/pdf_doc/matlab/licensing.pdf appears.

I inspect ../matlab/licensing.pdf but the link does not appear on the right hand. Thus I am unable to extract this link with regrex in Python.

Please help me extract this link from the page source.

Upvotes: 2

Views: 1308

Answers (1)

Adam.Er8
Adam.Er8

Reputation: 13403

try using urllib.parse.urljoin

example:

import urllib.parse

base = r"https://www.mathworks.com/help/pdf_doc/install/index.html"
link_in_html = r"../matlab/licensing.pdf"

result = urllib.parse.urljoin(base, link_in_html)

print(result)

Upvotes: 1

Related Questions