Evan Fosmark
Evan Fosmark

Reputation: 101681

Proper way of having a unique identifier in Python?

Basically, I have a list like: [START, 'foo', 'bar', 'spam', eggs', END] and the START/END identifiers are necessary for later so I can compare later on. Right now, I have it set up like this:

START = object()
END = object()

This works fine, but it suffers from the problem of not working with pickling. I tried doing it the following way, but it seems like a terrible method of accomplishing this:

class START(object):pass
class END(object):pass

Could anybody share a better means of doing this? Also, the example I have set up above is just an oversimplification of a different problem.

Upvotes: 6

Views: 1487

Answers (5)

Casebash
Casebash

Reputation: 118792

If your list didn't have strings, I'd just use "start", "end" as Python makes the comparison O(1) due to interning.

If you do need strings, but not tuples, the complete cheapskate method is:

[("START",), 'foo', 'bar', 'spam', eggs', ("END",)]

PS: I was sure your list was numbers before, not strings, but I can't see any revisions so I must have imagined it

Upvotes: 1

Anand Chitipothu
Anand Chitipothu

Reputation: 4367

You can define a Symbol class for handling START and END.

class Symbol:
    def __init__(self, value):
        self.value = value

    def __eq__(self, other):
        return isinstance(other, Symbol) and other.value == self.value

    def __repr__(self):
        return "<sym: %r>" % self.value

    def __str__(self):
        return str(self.value)

START = Symbol("START")
END = Symbol("END")

# test pickle
import pickle
assert START == pickle.loads(pickle.dumps(START))
assert END == pickle.loads(pickle.dumps(END))

Upvotes: 2

steveha
steveha

Reputation: 76715

Actually, I like your solution.

A while back I was hacking on a Python module, and I wanted to have a special magical value that could not appear anywhere else. I spent some time thinking about it and the best I came up with is the same trick you used: declare a class, and use the class object as the special magical value.

When you are checking for the sentinel, you should of course use the is operator, for object identity:

for x in my_list:
    if x is START:
        # handle start of list
    elif x is END:
        # handle end of list
    else:
        # handle item from list

Upvotes: 1

Alex Martelli
Alex Martelli

Reputation: 881635

If you want an object that's guaranteed to be unique and can also be guaranteed to get restored to exactly the same identify if pickled and unpickled right back, top-level functions, classes, class instances, and if you care about is rather than == also lists (and other mutables), are all fine. I.e., any of:

# work for == as well as is
class START(object): pass
def START(): pass
class Whatever(object): pass
START = Whatever()

# if you don't care for "accidental" == and only check with `is`
START = []
START = {}
START = set()

None of these is terrible, none has any special advantage (depending if you care about == or just is). Probably def wins by dint of generality, conciseness, and lighter weight.

Upvotes: 10

Jack Lloyd
Jack Lloyd

Reputation: 8405

I think maybe this would be easier to answer if you were more explicit about what you need this for, but my inclination if faced with a problem like this would be something like:

>>> START = os.urandom(16).encode('hex')
>>> END = os.urandom(16).encode('hex')

Pros of this approach, as I'm seeing it

  • Your markers are strings (can pickle or otherwise easily serialize, eg to JSON or a DB, without any special effort)
  • Very unlikely to collide either accidentally or on purpose
  • Will serialize and deserialize to identical values, even across process restarts, which (I think) would not be the case for object() or an empty class.

Cons(?)

  • Each time they are newly chosen they will be completely different. (This being good or bad depends on details you have not provided, I would think).

Upvotes: 0

Related Questions