jesse
jesse

Reputation: 658

Filter List of Strings By Keys

My project has required this enough times that I'm hoping someone on here can give me an elegant way to write it.

I have a list of strings, and would like to filter out duplicates using a key/key-like functionality (like I can do with sorted([foo, key=bar)).

Most recently, I'm dealing with links.

Currently I have to create an empty list, and add in values if

Note: name is the name of the file the link links too -- just a regex matching

parsed_links = ["http://www.host.com/3y979gusval3/name_of_file_1",          
                "http://www.host.com/6oo8wha55crb/name_of_file_2", 
                "http://www.host.com/6gaundjr4cab/name_of_file_3",                
                "http://www.host.com/udzfiap79ld/name_of_file_6", 
                "http://www.host.com/2bibqho4mtox/name_of_file_5", 
                "http://www.host.com/4a31wozeljsp/name_of_file_4"]

links = []
[links.append(link) for link in parsed_links if not name(link) in 
             [name(lnk) for lnk in links]]

I want the final list to have the full links (so I can't just get rid of everything but the filenames and use set); but I'd like to be able to do this without creating an empty list every time.

Also, my current method seems inefficient (which is significant as it is often dealing with hundreds of links).

Any suggestions?

Upvotes: 2

Views: 128

Answers (2)

sloth
sloth

Reputation: 101072

Why not just use a dictionary?

links = dict((name(link), link) for link in parsed_links)

Upvotes: 3

Frédéric Hamidi
Frédéric Hamidi

Reputation: 263009

If I understand your question correctly, your performance problems may come from the list comprehension that is repeatedly evaluated in a tight loop.

Try caching the result by putting the list comprehension outside of the loop, then use another comprehension instead of append() on an empty list:

linkNames = [name(lnk) for lnk in links]
links = [link in parsed_links if not name(link) in linkNames]

Upvotes: 0

Related Questions