Reputation: 1752
I am building a general-purpose NLP pipeline that will be able to use one of the several state-of-the-art NLP libraries currently out there. The library and exact model to use will be specified when instantiating the general purpose pipe. For this, I have created an enum whose values will be passed over to the general pipe's init and looks like this for now (only the spaCy part is ready):
@unique
class LibrKey(Enum):
"""NLP-library to use."""
CUSTOM = auto()
SPACY_SM = "en_core_web_sm"
SPACY_MD = "en_core_web_md"
SPACY_LG = "en_core_web_lg"
SPACY_TR = "en_core_web_trf"
STANZA = auto()
TRANKIT = auto()
I was wondering, however, about the correctness of having both auto instances and strings as values for the enum. According to the latest enum docu, namely, https://docs.python.org/3/library/enum.html care should be taken in these cases:
Member values can be anything: int, str, etc.. If the exact value is unimportant you may use auto instances and an appropriate value will be chosen for you. Care must be taken if you mix auto with other values.
So I was also wondering what is exactly meant with having to take care here.
Upvotes: 1
Views: 437
Reputation: 69288
The reason for the warning is that mixing auto-assigned numeric values and manually assigned numeric values could end up duplicating values. The Enum
and IntEnum
auto()
use the last value seen and increment by one, so it's possible to specify a value already given by auto()
and end up with an alias instead of a unique member:
class Confused(Enum):
ONE = auto()
TWO = auto()
THREE = auto()
FOUR = auto()
TRIANGLE = 3
FIVE = auto()
In the above enumeration, FIVE
is an alias to FOUR
.
In certain situations mixing them can be quite effective, as seen here (slightly different syntax, same idea).
Using None
or Ellipsis
(or identical or same-valued object) as a placeholder could be a bad idea, depending on your usage -- using the identical or same-valued objects means only the first one is unique, all the others are aliases (and would fail if using the Unique
decorator). If you want to be absolutely clear about which ones are not yet usable, you can create your own _generate_next_value_
(which is what auto()
uses):
def _generate_next_value_(name, start, count, last_values, *args, **kwds):
return '%s not implemented' % (name, )
which would result in:
>>> list(LibrKey)
[
<LibrKey.CUSTOM: 'CUSTOM not implemented'>,
<LibrKey.SPACY_SM: 'en_core_web_sm'>,
<LibrKey.SPACY_MD: 'en_core_web_md'>,
<LibrKey.SPACY_LG: 'en_core_web_lg'>,
<LibrKey.SPACY_TR: 'en_core_web_trf'>,
<LibrKey.STANZA: 'STANZA not implemented'>,
<LibrKey.TRANKIT: 'TRANKIT not implemented'>,
]
Disclosure: I am the author of the Python stdlib Enum
, the enum34
backport, and the Advanced Enumeration (aenum
) library.
Upvotes: 1