Reputation: 1022
I'm currently doing a project in which I have to check multiple times if a given 2D point is "legal", that is if it's within the boundaries of a grid and if it hasn't been visited before. The grid has a fixed H x W
size, so I thought I could maybe store them not in a set, but a 2D lookup table. To my surprise, it's significantly slower than checking if it appears in a set, even though it's (in theory) an O(logn)
operation, as opposed to O(1)
with a simple array lookup.
First try:
class PointSet:
def __init__(self, h, w):
if h <= 0 or w <= 0:
raise ValueError("Invalid size")
self.h = h
self.w = w
self.lookup = [[False for _ in range(w)] for _ in range(h)]
self.size = 0
def add(self, point):
r, c = point
self.lookup[r][c] = True
self.size += 1
def remove(self, point):
r, c = point
self.lookup[r][c] = False
self.size -= 1
def __contains__(self, point):
r, c = point
return 0 <= r < self.h and 0 <= c < self.w and self.lookup[r][c]
def __bool__(self):
return self.size != 0
def __len__(self):
return self.size
# H, W = ...
# pointset = PointSet(H, W)
# if (3, 5) in pointset:
# ...
Second try:
# pointset = set()
# if (3, 5) in pointset:
# ...
Second code executes much faster. Why is that the case?
Upvotes: 1
Views: 87
Reputation: 78556
First, you have to consider that your implementation of __contains__
method calls __getitem__
twice (because you have a list of lists) with each having 0(1) complexity, and would evaluate two chained comparisons (0 <= r < self.h
and 0 <= c < self.w
); in the worst case, all the expressions are evaluated since and
will only short-circuit on the first False
. There's an additional overhead coming from the iterable unpacking in the preceding line. There are also minor additions from the attribute lookups.
Membership lookup for sets is just plain 0(1), so I don't see how your code can possibly beat that.
Upvotes: 1