Reputation: 1506
I am trying to build a URL by joining some dynamic components. I thought of using something like os.path.join()
BUT for URLs in my case. From research I found urlparse.urljoin()
does the same thing. However, it looks like it only take two arguments at one time.
I have the following so far which works but looks repetitive:
a = urlparse.urljoin(environment, schedule_uri)
b = urlparse.urljoin(a, str(events_to_hours))
c = urlparse.urljoin(b, str(events_from_date))
d = urlparse.urljoin(c, str(api_version))
e = urlparse.urljoin(d, str(id))
url = e + '.json'
Output = http://example.com/schedule/12/20160322/v1/1.json
The above works and I tried to make it shorter this way:
url_join_items = [environment, schedule_uri, str(events_to_hours),
str(events_from_date), str(api_version), str(id), ".json"]
new_url = ""
for url_items in url_join_items:
new_url = urlparse.urljoin(new_url, url_items)
Output: http://example.com/schedule/.json
But the second implementation does not work. Please suggest me how to fix this or the better way of doing it.
EDIT 1:
The output from the reduce
solution looks like this (unfortunately):
Output: http://example.com/schedule/.json
Upvotes: 21
Views: 25268
Reputation: 14272
This is what worked for me all the best:
def join_url_parts(base: str, parts: list[str], allow_fragments: bool = True) -> str:
"""Join multiple URL parts together.
See the examples below. All of them would produce the same result:
`https://example.com/api/v1/users/`
print(join_url_parts("https://example.com", ["api", "v1", "users"]))
print(join_url_parts("https://example.com", ["api", "v1/", "users"]))
print(join_url_parts("https://example.com/", ["api/", "v1/", "users/"]))
print(join_url_parts("https://example.com/", ["/api/", "/v1/", "users/"]))
"""
url = "/".join(map(lambda x: str(x).strip("/"), parts)) + "/"
return urljoin(base, url, allow_fragments)
This basically replicates the standard urljoin
but allows the second arguments to be parts (list of strings).
Upvotes: 0
Reputation: 554
Simple solution will be:
def url_join(*parts: str) -> str:
import re
line = '/'.join(parts)
line = re.sub('/{2,}', '/', line)
return re.sub(':/', '://', line)
Upvotes: 2
Reputation: 37154
Here's a bit silly but workable solution, given that parts
is a list of URL parts in order
my_url = '/'.join(parts).replace('//', '/').replace(':/', '://')
I wish replace
would have a from
option but it does not hence the second one is to recover https://
double slash
Nice thing is you don't have to worry about parts already having (or not having) any slashes
Upvotes: 0
Reputation: 9709
I also needed something similar and came up with this solution:
from urllib.parse import urljoin, quote_plus
def multi_urljoin(*parts):
return urljoin(parts[0], "/".join(quote_plus(part.strip("/"), safe="/") for part in parts[1:]))
print(multi_urljoin("https://server.com", "path/to/some/dir/", "2019", "4", "17", "some_random_string", "image.jpg"))
This prints 'https://server.com/path/to/some/dir/2019/4/17/some_random_string/image.jpg'
Upvotes: 9
Reputation: 2199
Using join
Have you tried simply "/".join(url_join_items)
. Does not http always use the forward slash? You might have to manually setup the prefix "https://" and the suffix, though.
Something like:
url = "https://{}.json".format("/".join(url_join_items))
Using reduce and urljoin
Here is a related question on SO that explains to some degree the thinking behind the implementation of urljoin
. Your use case does not appear to be the best fit.
When using reduce
and urljoin
, I'm not sure it will do what the question intends, which is semantically like os.path.join
, but for urls. Consider the following:
from urllib.parse import urljoin
from functools import reduce
parts_1 = ["a","b","c","d"]
parts_2 = ["https://","server.com","somedir","somefile.json"]
parts_3 = ["https://","server.com/","somedir/","somefile.json"]
out1 = reduce(urljoin, parts_1)
print(out1)
d
out2 = reduce(urljoin, parts_2)
print(out2)
https:///somefile.json
out3 = reduce(urljoin, parts_3)
print(out3)
https:///server.com/somedir/somefile.json
Note that with the exception of the extra "/" after the https prefix, the third output is probably closest to what the asker intends, except we've had to do all the work of formatting the parts with the separator.
Upvotes: 25