Reputation: 4267

Segregate list of urls based on schema criteria

Having a list like

['http:host1', 'http:host2', 'http:host3', 'https:host1', 'https:host4']

I want to produce a list of pairs where pair has same host, but different schema:

[('http:host1', 'https:host1'), ('http:host2'), ...]

I can segregate of schema criteria quite easily:

with_https = [x for x in li if x.startswith('https')]

but cannot think of an elegant solution to meet host criteria

Upvotes: 0

Answers (2)

buran

Reputation: 14233

using urllib.parse and collections.defaultdict:

from collections import defaultdict
from urllib.parse import urlparse

grouped_urls = defaultdict(list)

urls = ['http:host1', 'http:host2', 'http:host3', 'https:host1', 'https:host4']

for url in urls:
    grouped_urls[urlparse(url).paths].append(url)

print(grouped_urls)

output:

defaultdict(<class 'list'>, {'host1': ['http:host1', 'https:host1'], 'host2': ['http:host2'], 'host3': ['http:host3'], 'host4': ['https:host4']})

Upvotes: 4

Jonathan1609

Reputation: 1919

You didn't give us the entire output you want, so it seems like this code would help you achieving this:

urls = ['http:host1', 'http:host2', 'http:host3', 'https:host1', 'https:host4']
new_urls = [(x, x.replace("p", "ps", 1) if x[4] != "s" else x.replace("ps", "p", 1)) for x in urls]
print(new_urls)

And the output is

[('http:host1', 'https:host1'), ('http:host2', 'https:host2'), ('http:host3', 'https:host3'), ('https:host1', 'http:host1'), ('https:host4', 'http:host4')]

Upvotes: 0

Segregate list of urls based on schema criteria

Answers (2)

Related Questions