David542
David542

Reputation: 110572

Removing duplicates in python list

I have the following list of titles:

titles = ['Saw (US)', 'Saw (AU)', 'Dear Sally (SE)']

How would I get the following:

titles = ['Saw (US)', 'Dear Sally (SE)']

Basically, I need to remove the duplicate titles. It doesn't matter which territory shows, as long as it is on (i.e., I can remove any duplicate).

Here is what I have tried, unsuccessfully:

[title for title in localized_titles if title.split(' (')[0] not in localized_titles]

Upvotes: 2

Views: 207

Answers (7)

noio
noio

Reputation: 5812

If that is really the exact format of your titles, make sure that your localized_titles is right:

generic_titles = [t.split('(')[0] for t in titles]
titles = [title for title in titles if title.split(' (')[0] not in generic_titles]

But, this all breaks when there are other parentheses in the titles.

Upvotes: 1

jamylak
jamylak

Reputation: 133764

>>> from collections import OrderedDict
>>> titles = ['Saw (US)', 'Saw (AU)', 'Dear Sally (SE)']
>>> list(OrderedDict((t.rpartition(' (')[0], t) for t in titles).values())
['Saw (AU)', 'Dear Sally (SE)']

Upvotes: 1

cmd
cmd

Reputation: 5830

fast, and preserves order

seen = set()
[title for title in titles
 if title.split(' (')[0] not in seen and not seen.add(title.split(' (')[0])]

Upvotes: 0

qwwqwwq
qwwqwwq

Reputation: 7329

For the sake of code golf:

titles = ['('.join(x) for x in dict([x.split('(') for x in titles]).items()]

Assumes only one ( character per title, at the beginning of the country.

Upvotes: 0

David542
David542

Reputation: 110572

Here's a roundabout way of getting there:

localized_titles, existing_stems = [], []
for item in localized:
    stem = item.split(' (')[0]
    if stem not in existing_stems:
        existing_stems.append(stem)
        localized_titles.append(item)

Upvotes: 1

Peter DeGlopper
Peter DeGlopper

Reputation: 37364

I'm not sure this is the most elegant solution, but it should work - you can use your non-territory version of the title as a dict key.

unique_titles = dict((title.rsplit(' (', 1)[0], title) for title in titles)

Or if you need to preserve order, an OrderedDict.

unique_titles.values() would be the titles including territories (one per title).

Using the optional argument to rsplit to limit it to at most one split, and rsplit to start looking for parens from the end rather than beginning of the string.

Upvotes: 2

Senjai
Senjai

Reputation: 1816

Try using a dictionary to keep track of how many instances of each item in the array you have. Let the key in the dictionary be the value in the array, and the value of dictionary either true or false depending whether it has seen that item yet.

You can then iterate through the array, adding to the dictionary and removing items from the array if they exist in the dictionary. It's how I do it, but I'm still learning.

Upvotes: 0

Related Questions