Reputation: 1317
I've been attempting to compare two lists of dictionaries, and to find the userid's of new people in list2 that aren't in list1. For example the first list:
list1 = [{"userid": "13451", "name": "james", "age": "24", "occupation": "doctor"}, {"userid": "94324""name": "john", "age": "33", "occupation": "pilot"}]
and the second list:
list2 = [{"userid": "13451", "name": "james", "age": "24", "occupation": "doctor"}, {"userid": "94324""name": "john", "age": "33", "occupation": "pilot"}, {"userid": "34892", "name": "daniel", "age": "64", "occupation": "chef"}]
the desired output:
newpeople = ['34892']
This is what I've managed to put together:
list1tuple = ((d["userid"]) for d in list1)
list2tuple = ((d["userid"]) for d in list2)
newpeople = [t for t in list2tuple if t not in list1tuple]
This actually seems to be pretty efficient, especially considering the lists I am using might contain over 50,000 dictionaries. However, here's the issue:
If it finds a userid in list2 that indeed isn't in list1, it adds it to newpeople (as desired), but then also adds every other userid that comes afterwards in list2 to newpeople as well.
So, say list2 contains 600 userids and the 500th userid in list2 isn't found anywhere in list1, the first item in newpeople will be the 500th userid (again, as desired), but then followed by the other 100 userids that came after the new one.
This is pretty perplexing to me - I'd greatly appreciate anyone helping me get to the bottom of why this is happening.
Upvotes: 1
Views: 51
Reputation: 109546
As can be seen from a python console, list1tuple and list2tuple are generators:
>>> ((d["userid"]) for d in list1)
<generator object <genexpr> at 0x10a9936e0>
Although the second one can remain a generator (there is no need to expand the list), the first one should first be converted to a list, set or tuple, e.g.:
list1set = {d['userid'] for d in list1}
list2generator = (d['userid'] for d in list2)
You can now check for membership in the group:
>>> [t for t in list2generator if t not in list1set]
['34892']
Upvotes: 1
Reputation: 23176
Currently you have set list1tuple
and list2tuple
as:
list1tuple = ((d["userid"]) for d in list1)
list2tuple = ((d["userid"]) for d in list2)
These are generators, not lists (or tuples), which means they can only be iterated over once, which is causing your problem.
You could change them to be lists:
list1tuple = [d["userid"] for d in list1]
list2tuple = [d["userid"] for d in list2]
which would allow you to iterate over them as many times as you like. But a better solution would be to simply make them sets:
list1tuple = set(d["userid"] for d in list1)
list2tuple = set(d["userid"] for d in list2)
And then take the set difference
newpeople = list2tuple - list1tuple
Upvotes: 3