How to extract absolute URL of href with relative path?

Question

I am trying to extract download links from this link.

Here is the page source (viewing in Google Chrome) of that link:

When I point at ../matlab/licensing.pdf on the page source, a link https://www.mathworks.com/help/pdf_doc/matlab/licensing.pdf appears.

I inspect ../matlab/licensing.pdf but the link does not appear on the right hand. Thus I am unable to extract this link with regrex in Python.

Please help me extract this link from the page source.

Adam.Er8 · Accepted Answer

try using urllib.parse.urljoin

example:

import urllib.parse

base = r"https://www.mathworks.com/help/pdf_doc/install/index.html"
link_in_html = r"../matlab/licensing.pdf"

result = urllib.parse.urljoin(base, link_in_html)

print(result)

How to extract absolute URL of href with relative path?

Answers (1)

Related Questions