Reputation: 2870
I am trying to extract download links from this link.
Here is the page source (viewing in Google Chrome) of that link:
When I point at ../matlab/licensing.pdf
on the page source, a link https://www.mathworks.com/help/pdf_doc/matlab/licensing.pdf
appears.
I inspect ../matlab/licensing.pdf
but the link does not appear on the right hand. Thus I am unable to extract this link with regrex in Python.
Please help me extract this link from the page source.
Upvotes: 2
Views: 1308
Reputation: 13403
try using urllib.parse.urljoin
example:
import urllib.parse
base = r"https://www.mathworks.com/help/pdf_doc/install/index.html"
link_in_html = r"../matlab/licensing.pdf"
result = urllib.parse.urljoin(base, link_in_html)
print(result)
Upvotes: 1