albert
albert

Reputation: 8583

Concatenate base url and path using urllib

I am trying to concatenate a base URL url1 and a relative path url2 using Python 3's urllib.parse, but do not get the desired result. In addition I tried using os.path.join (which is not meant to be used for this purpose) and simple string concatenation using .format():

import os.path
import urllib.parse

url1 = "www.sampleurl.tld"
url2 = "/some/path/here"


print(urllib.parse.urljoin(url1, url2))
# --> "/some/path/here"

print(os.path.join(url1, url2))
# --> "/some/path/here"

print("{}{}".format(url1, url2))
# --> "www.sampleurl.tld/some/path/here" (desired output)

The simple string concatenation returns the desired absolute url. However, this approach seems to be very naive and not very elegant, since it assumes that url2 starts with / which may not be the case. For sure, I could check this by calling url2.startswith('/') and change the string concatenation to "{}/{}".format(url1, url2) to provide the desired flexibility, but I am still wondering how to do this in a proper way by means of urllib.parse.

Upvotes: 0

Views: 7200

Answers (2)

Ravi P
Ravi P

Reputation: 1

import urllib.parse

url1 = 'www.sampleurl.tld'
url2 = '/some/path/here'

urlString = urllib.parse.ParseResult(scheme='https', netloc=url1, path=url2, params='', query='', fragment='')
urllib.parse.urlunparse(urlString) 

You can try this. The URL is not created from a list, instead its from a class ParseResult.

Upvotes: 0

flazzarini
flazzarini

Reputation: 8171

urljoin expects the first argument baseurl to include the schema.

So adding https:// or http:// for that matter to your url1 string should do the job.

import urllib.parse

url1 = "https://www.sampleurl.tld"
url2 = "/some/path/here"


print(urllib.parse.urljoin(url1, url2))
# --> "https://www.sampleurl.tld/some/path/here"

Upvotes: 2

Related Questions