Ian
Ian

Reputation: 218

Python intersection of multiple datetime lists

I'm trying to find the intersection list of 5 lists of datetime objects. I know the intersection of lists question has come up a lot on here, but my code is not performing as expected (like the ones from the other questions).

Here are the first 3 elements of the 5 lists with the exact length of the list at the end.

[datetime.datetime(2014, 8, 14, 19, 25, 6), datetime.datetime(2014, 8, 14, 19, 25, 7), datetime.datetime(2014, 8, 14, 19, 25, 9)] # length 38790
[datetime.datetime(2014, 8, 14, 19, 25, 6), datetime.datetime(2014, 8, 14, 19, 25, 7), datetime.datetime(2014, 8, 14, 19, 25, 9)] # length 38818
[datetime.datetime(2014, 8, 14, 19, 25, 6), datetime.datetime(2014, 8, 14, 19, 25, 7), datetime.datetime(2014, 8, 14, 19, 25, 9)] # length 38959
[datetime.datetime(2014, 8, 14, 19, 25, 6), datetime.datetime(2014, 8, 14, 19, 25, 7), datetime.datetime(2014, 8, 14, 19, 25, 9)] # length 38802
[datetime.datetime(2014, 8, 14, 19, 25, 6), datetime.datetime(2014, 8, 14, 19, 25, 7), datetime.datetime(2014, 8, 14, 19, 25, 9)] # length 40415

I've made a list of these lists called times. I've tried 2 methods of intersecting.

Method 1:

intersection = times[0] # make intersection the first list
for i in range(len(times)):
    if i == 0:
        continue
    intersection = [val for val in intersection if val in times[i]]

This method results in a list with length 20189 and takes 104 seconds to run.

Method 2:

intersection = times[0] # make intersection the first list
for i in range(len(times)):
    if i == 0:
        continue
    intersection = list(set(intersection) & set(times[i]))

This method results in a list with length 20148 and takes 0.1 seconds to run.

I've run into 2 problems with this. The first problem is that the two methods yield different size intersections and I have no clue why. And the other problem is that the datetime object datetime.datetime(2014, 8, 14, 19, 25, 6) is clearly in all 5 lists (see above) but when I print (datetime.datetime(2014, 8, 14, 19, 25, 6) in intersection) it returns False.

Upvotes: 1

Views: 2796

Answers (4)

jfs
jfs

Reputation: 414665

intersection = set(*times[:1]).intersection(*times[1:])

Upvotes: 0

Your first list times[0] has duplicate elements; this is the reason for inconsistency. If you would do intersection = list(set(times[0])) in your first snippet, the problem would go away.

As for your second code, the code will be faster if you never do changes between lists and sets:

intersection = set(times[0]) # make a set of the first list
for timeset in times[1:]:
    intersection.intersection_update(timeset)

# if necessary make into a list again
intersection = list(intersection)

And actually since intersection supports multiple iterables as separate arguments. you can simply replace all your code with:

intersection = set(times[0]).intersection(*times[1:])

For the in intersection problem, is the instance an actual datetime.datetime or just pretending to be? At least the timestamps seem not to be timezone aware.

Upvotes: 1

zs2020
zs2020

Reputation: 54543

There might be duplicated times and you can do it simply like this:

Python3:

import functools
result = functools.reduce(lambda x, y: set(x) & set(y), times)

Python2:

result = reduce(lambda x, y: set(x) & set(y), times)

Upvotes: 0

jh314
jh314

Reputation: 27812

Lists can have duplicate items, which can cause inconsistencies with the length. To avoid these duplicates, you can turn each list of datetimes into a set:

map(set, times)

This will give you a list of sets (with duplicate times removed). To find the intersection, you can use set.intersection:

intersection = set.intersection(*map(set, times))

With your example, intersection will be this set:

set([datetime.datetime(2014, 8, 14, 19, 25, 9), datetime.datetime(2014, 8, 14, 19, 25, 6), datetime.datetime(2014, 8, 14, 19, 25, 7)])

Upvotes: 0

Related Questions