Bijan
Bijan

Reputation: 8648

Remove Duplicates of string Based on Pattern

I have a list of URLS in format of http://WEBSITE.com/XXXXX/YYYYY where X and Y are random characters.

How would I have Python only keep the results that have distinct case-insensitive XXXXX values? It does not matter if it keeps the YYYYY portion?

Upvotes: 0

Views: 78

Answers (3)

RobertB
RobertB

Reputation: 1929

Use a set comprehension:

values = { url.split("/")[3] for url in url_list }

Upvotes: 1

salezica
salezica

Reputation: 77029

Well, you can easily shave off the last part of the path:

id = "/".join(url.split('/')[:-1]) # split, lose last item, rejoin

Then put your IDs on a set() to keep them unique:

ids = set()
ids.add(id)

Upvotes: 2

Eugene K
Eugene K

Reputation: 3457

Look at rsplit() and then use a Set. rsplit is used to split a string by a delimiter, such as '/', and a set holds unique elements.

https://docs.python.org/2/library/stdtypes.html - rsplit() https://docs.python.org/2/library/stdtypes.html#set - set

Upvotes: 2

Related Questions