Reputation: 4267
Having a list like
['http:host1', 'http:host2', 'http:host3', 'https:host1', 'https:host4']
I want to produce a list of pairs where pair has same host, but different schema:
[('http:host1', 'https:host1'), ('http:host2'), ...]
I can segregate of schema criteria quite easily:
with_https = [x for x in li if x.startswith('https')]
but cannot think of an elegant solution to meet host criteria
Upvotes: 0
Views: 62
Reputation: 14233
using urllib.parse
and collections.defaultdict
:
from collections import defaultdict
from urllib.parse import urlparse
grouped_urls = defaultdict(list)
urls = ['http:host1', 'http:host2', 'http:host3', 'https:host1', 'https:host4']
for url in urls:
grouped_urls[urlparse(url).paths].append(url)
print(grouped_urls)
output:
defaultdict(<class 'list'>, {'host1': ['http:host1', 'https:host1'], 'host2': ['http:host2'], 'host3': ['http:host3'], 'host4': ['https:host4']})
Upvotes: 4
Reputation: 1919
You didn't give us the entire output you want, so it seems like this code would help you achieving this:
urls = ['http:host1', 'http:host2', 'http:host3', 'https:host1', 'https:host4']
new_urls = [(x, x.replace("p", "ps", 1) if x[4] != "s" else x.replace("ps", "p", 1)) for x in urls]
print(new_urls)
And the output is
[('http:host1', 'https:host1'), ('http:host2', 'https:host2'), ('http:host3', 'https:host3'), ('https:host1', 'http:host1'), ('https:host4', 'http:host4')]
Upvotes: 0