bornytm
bornytm

Reputation: 813

List/Dict Data Manipulation - Delete Duplicates

After doing some web scraping and combining the results, I'm left with a list of dictionaries. Of one the keys (titles) is a list of lists.

 thelist = [{"name":"a name", "titles":[["foo","bar", ... ],["foo","baz",["..."], ... ]]},
{"name":"another name", "titles":[["foo","bar", ... ],["foo","baz",["..."], ... ]]}, ... ]

The goal is to eliminate titles that appear in more than one list within the list of titles in each dictionary and to replace the list of lists of titles with a single list of titles (with no duplicates).

The code I have written right now accesses all the items in the list of lists correctly, but I'm having difficulty actually doing the elimination of duplicates.

match = ""
for dicts in thelist:
    for listoftitles in dicts['titles']:
        for title in listoftitles:
            title = match
        for title in listoftitles:
            if match == title:
                print title
                #del title

It seems that match is never equal to the value in title. I've tried changing the nesting of the loops but so far to no avail. I'm getting lost somewhere and I'm not sure what else to try. Any advice is greatly appreciated.

Upvotes: 1

Views: 164

Answers (2)

Padraic Cunningham
Padraic Cunningham

Reputation: 180441

dicts are mutable so you can just update each dict in the original list, using itertools.chain to flatten the list of lists:

l = [{'name': 'a name', 'titles': [['foo','bar'],['foo','baz']]}]

from itertools import chain
for d in l:
    d["titles"] = list(set(chain.from_iterable(d["titles"])))

print(l)

Output:

[{'titles': ['bar', 'baz', 'foo'], 'name': 'a name'}]

If you wanted to maintain the order each subelement was seen you could use an OrderedDict to remove dupes:

from itertools import chain
from collections import OrderedDict

for d in l:
    d["titles"] = list(OrderedDict.fromkeys(chain.from_iterable(d["titles"])))

print(l)

Output:

[{'name': 'a name', 'titles': ['foo', 'bar', 'baz']}]

Upvotes: 0

Paul Draper
Paul Draper

Reputation: 83263

The idiomatic way to get a list without duplicates is list(set(some_iterable))

Throw in a list comprehension and we get

thelist = [{'name': 'a name', 'titles': [['foo','bar'],['foo','baz']]}]

print [
    {
        'name': d['name'],
        'titles': list(set(title for lst in d['titles'] for title in lst)) 
    }
    for d in thelist 
]

prints

[{'name': 'a name', 'titles': ['baz', 'foo', 'bar']}]

Upvotes: 1

Related Questions