Reputation: 218
I'm trying to find the intersection list of 5 lists of datetime objects. I know the intersection of lists question has come up a lot on here, but my code is not performing as expected (like the ones from the other questions).
Here are the first 3 elements of the 5 lists with the exact length of the list at the end.
[datetime.datetime(2014, 8, 14, 19, 25, 6), datetime.datetime(2014, 8, 14, 19, 25, 7), datetime.datetime(2014, 8, 14, 19, 25, 9)] # length 38790
[datetime.datetime(2014, 8, 14, 19, 25, 6), datetime.datetime(2014, 8, 14, 19, 25, 7), datetime.datetime(2014, 8, 14, 19, 25, 9)] # length 38818
[datetime.datetime(2014, 8, 14, 19, 25, 6), datetime.datetime(2014, 8, 14, 19, 25, 7), datetime.datetime(2014, 8, 14, 19, 25, 9)] # length 38959
[datetime.datetime(2014, 8, 14, 19, 25, 6), datetime.datetime(2014, 8, 14, 19, 25, 7), datetime.datetime(2014, 8, 14, 19, 25, 9)] # length 38802
[datetime.datetime(2014, 8, 14, 19, 25, 6), datetime.datetime(2014, 8, 14, 19, 25, 7), datetime.datetime(2014, 8, 14, 19, 25, 9)] # length 40415
I've made a list of these lists called times
. I've tried 2 methods of intersecting.
Method 1:
intersection = times[0] # make intersection the first list
for i in range(len(times)):
if i == 0:
continue
intersection = [val for val in intersection if val in times[i]]
This method results in a list with length 20189
and takes 104 seconds to run.
Method 2:
intersection = times[0] # make intersection the first list
for i in range(len(times)):
if i == 0:
continue
intersection = list(set(intersection) & set(times[i]))
This method results in a list with length 20148
and takes 0.1 seconds to run.
I've run into 2 problems with this. The first problem is that the two methods yield different size intersections and I have no clue why. And the other problem is that the datetime object datetime.datetime(2014, 8, 14, 19, 25, 6)
is clearly in all 5 lists (see above) but when I print (datetime.datetime(2014, 8, 14, 19, 25, 6) in intersection)
it returns False.
Upvotes: 1
Views: 2796
Reputation: 133998
Your first list times[0]
has duplicate elements; this is the reason for inconsistency. If you would do intersection = list(set(times[0]))
in your first snippet, the problem would go away.
As for your second code, the code will be faster if you never do changes between lists and sets:
intersection = set(times[0]) # make a set of the first list
for timeset in times[1:]:
intersection.intersection_update(timeset)
# if necessary make into a list again
intersection = list(intersection)
And actually since intersection
supports multiple iterables as separate arguments. you can simply replace all your code with:
intersection = set(times[0]).intersection(*times[1:])
For the in intersection
problem, is the instance an actual datetime.datetime
or just pretending to be? At least the timestamps seem not to be timezone aware.
Upvotes: 1
Reputation: 54543
There might be duplicated times and you can do it simply like this:
Python3:
import functools
result = functools.reduce(lambda x, y: set(x) & set(y), times)
Python2:
result = reduce(lambda x, y: set(x) & set(y), times)
Upvotes: 0
Reputation: 27812
Lists can have duplicate items, which can cause inconsistencies with the length. To avoid these duplicates, you can turn each list of datetimes into a set:
map(set, times)
This will give you a list of sets (with duplicate times removed). To find the intersection, you can use set.intersection
:
intersection = set.intersection(*map(set, times))
With your example, intersection will be this set:
set([datetime.datetime(2014, 8, 14, 19, 25, 9), datetime.datetime(2014, 8, 14, 19, 25, 6), datetime.datetime(2014, 8, 14, 19, 25, 7)])
Upvotes: 0