dan martin
dan martin

Reputation: 1317

Python - comparing lists of dictionaries using tuples - unexpected behaviour?

I've been attempting to compare two lists of dictionaries, and to find the userid's of new people in list2 that aren't in list1. For example the first list:

list1 = [{"userid": "13451", "name": "james", "age": "24", "occupation": "doctor"}, {"userid": "94324""name": "john", "age": "33", "occupation": "pilot"}]

and the second list:

list2 = [{"userid": "13451", "name": "james", "age": "24", "occupation": "doctor"}, {"userid": "94324""name": "john", "age": "33", "occupation": "pilot"}, {"userid": "34892", "name": "daniel", "age": "64", "occupation": "chef"}]

the desired output:

newpeople = ['34892']

This is what I've managed to put together:

list1tuple = ((d["userid"]) for d in list1)
list2tuple = ((d["userid"]) for d in list2)

newpeople = [t for t in list2tuple if t not in list1tuple]

This actually seems to be pretty efficient, especially considering the lists I am using might contain over 50,000 dictionaries. However, here's the issue:

If it finds a userid in list2 that indeed isn't in list1, it adds it to newpeople (as desired), but then also adds every other userid that comes afterwards in list2 to newpeople as well.

So, say list2 contains 600 userids and the 500th userid in list2 isn't found anywhere in list1, the first item in newpeople will be the 500th userid (again, as desired), but then followed by the other 100 userids that came after the new one.

This is pretty perplexing to me - I'd greatly appreciate anyone helping me get to the bottom of why this is happening.

Upvotes: 1

Views: 51

Answers (2)

Alexander
Alexander

Reputation: 109546

As can be seen from a python console, list1tuple and list2tuple are generators:

>>> ((d["userid"]) for d in list1)
<generator object <genexpr> at 0x10a9936e0>

Although the second one can remain a generator (there is no need to expand the list), the first one should first be converted to a list, set or tuple, e.g.:

list1set = {d['userid'] for d in list1}
list2generator = (d['userid'] for d in list2)

You can now check for membership in the group:

>>> [t for t in list2generator if t not in list1set]
['34892']

Upvotes: 1

donkopotamus
donkopotamus

Reputation: 23176

Currently you have set list1tuple and list2tuple as:

list1tuple = ((d["userid"]) for d in list1)
list2tuple = ((d["userid"]) for d in list2)

These are generators, not lists (or tuples), which means they can only be iterated over once, which is causing your problem.

You could change them to be lists:

list1tuple = [d["userid"] for d in list1]
list2tuple = [d["userid"] for d in list2]

which would allow you to iterate over them as many times as you like. But a better solution would be to simply make them sets:

list1tuple = set(d["userid"] for d in list1)
list2tuple = set(d["userid"] for d in list2)

And then take the set difference

newpeople = list2tuple - list1tuple

Upvotes: 3

Related Questions