Reputation: 813
After doing some web scraping and combining the results, I'm left with a list of dictionaries. Of one the keys (titles) is a list of lists.
thelist = [{"name":"a name", "titles":[["foo","bar", ... ],["foo","baz",["..."], ... ]]},
{"name":"another name", "titles":[["foo","bar", ... ],["foo","baz",["..."], ... ]]}, ... ]
The goal is to eliminate titles that appear in more than one list within the list of titles in each dictionary and to replace the list of lists of titles with a single list of titles (with no duplicates).
The code I have written right now accesses all the items in the list of lists correctly, but I'm having difficulty actually doing the elimination of duplicates.
match = ""
for dicts in thelist:
for listoftitles in dicts['titles']:
for title in listoftitles:
title = match
for title in listoftitles:
if match == title:
print title
#del title
It seems that match is never equal to the value in title. I've tried changing the nesting of the loops but so far to no avail. I'm getting lost somewhere and I'm not sure what else to try. Any advice is greatly appreciated.
Upvotes: 1
Views: 164
Reputation: 180441
dicts are mutable so you can just update each dict in the original list, using itertools.chain
to flatten the list of lists:
l = [{'name': 'a name', 'titles': [['foo','bar'],['foo','baz']]}]
from itertools import chain
for d in l:
d["titles"] = list(set(chain.from_iterable(d["titles"])))
print(l)
Output:
[{'titles': ['bar', 'baz', 'foo'], 'name': 'a name'}]
If you wanted to maintain the order each subelement was seen you could use an OrderedDict
to remove dupes:
from itertools import chain
from collections import OrderedDict
for d in l:
d["titles"] = list(OrderedDict.fromkeys(chain.from_iterable(d["titles"])))
print(l)
Output:
[{'name': 'a name', 'titles': ['foo', 'bar', 'baz']}]
Upvotes: 0
Reputation: 83263
The idiomatic way to get a list without duplicates is list(set(some_iterable))
Throw in a list comprehension and we get
thelist = [{'name': 'a name', 'titles': [['foo','bar'],['foo','baz']]}]
print [
{
'name': d['name'],
'titles': list(set(title for lst in d['titles'] for title in lst))
}
for d in thelist
]
prints
[{'name': 'a name', 'titles': ['baz', 'foo', 'bar']}]
Upvotes: 1