eljobso
eljobso

Reputation: 362

Get intersection of list elements with different sublist datatypes

I have two lists, which contains list elements, e.g:

list1 = [['placeholder1', {'data': 'data1'}], ['placeholder2', {'data': 'data2'}], ['placeholder2', {'data': 'data1'}]]
list2 = [['placeholder2', {'data': 'data2'}], ['placeholder3', {'data': 'data5'}]]

intersection_result = [['placeholder2', {'data': 'data2'}]]

The structure of the sub-list elements is just an example. It can also happen that all the sub-list elements contains strings ['asdf', 'qwert'] or a mixture of string and numbers ['sdfs', 232]. However, the sub-list structure is always the same (in both lists).

How can I get the intersection of lists elements which are identical in both lists?

Upvotes: 1

Views: 74

Answers (2)

DhruvPathak
DhruvPathak

Reputation: 43265

A simple solution, which would be independent of the structure of your data. You can generate signature hashes (using json or pformat) for your data, and find common hashes in both list1 and list2.

Demo : http://ideone.com/5i9cs8

import json

list1 = [['placeholder1', {'data': 'data1'}], ['placeholder2', {'data': 'data2'}], ['placeholder2', {'data': 'data1'}]]
list2 = [['placeholder2', {'data': 'data2'}], ['placeholder3', {'data': 'data5'}]]
sig1 = { hash(json.dumps(x, sort_keys=True)):x for x in list1 }
sig2 = { hash(json.dumps(x, sort_keys=True)):x for x in list2 }
result = {x:sig1[x] for x in sig1 if x in sig2}
print(result)
#prints {-7754841686355067234: ['placeholder2', {'data': 'data2'}]}
  • If your dictionaries have data which does not support json serialization e.g. datetime, pformat will work well, or you can use cPickle, str will also work for simple cases. You can make the choice based on your dataset and efficiency required.

Upvotes: 1

Dimitris Fasarakis Hilliard
Dimitris Fasarakis Hilliard

Reputation: 160597

If my understanding is correct, you can get the intersection by checking and selecting any() of the elements in the smallest list which are equal to ones in the larger one.

With a comprehension, this would look like this:

intersection_res = [l for l in min(list2, list1, key=len) if any(l == l2 for l2 in max(list1, list2, key=len))]

This uses, min and max with a key assigned to len to always select from the smaller list and check against the larger one.

This yields:

print(intersection_res)
[['placeholder2', {'data': 'data2'}]]

This comprehension can be trimmed down if you pre-assign the min-max lists or, of course, if you are always certain which list is larger than the other:

sm, la = list1, list2 if len(list1) < len(list2) else list2, list1
intersection_res = [l for l in sm if any(l == l2 for l2 in la)]

Upvotes: 3

Related Questions