scanny
scanny

Reputation: 28953

leading underscore in Python class names

This question builds on this one asked earlier and also this one, but addresses a more subtle point, specifically, what counts as an "internal class"?

Here's my situation. I'm building a Python library for manipulating MS Office files in which many of the classes are not meant to be constructed externally, yet many of their methods are important parts of the API. For example:

class _Slide(object):
    def add_shape(self):
        ...

_Slide is not to be constructed externally, as an end-user of the library you get a new slide by calling Presentation.add_slide(). However, once you have a slide you definitely want to be able to call add_shape(). So the method is part of the API but the constructor isn't. This situation arises dozens of times in the library because only the Presentation class has an external/API constructor.

PEP8 is ambiguous on this point, making reference to "internal classes" without elaboration of what counts as an internal class. In this case I suppose one could say the class is "partially internal".

The problem is I only have two notations to distinguish at least three different situations:

  1. The constructor and "public/API" methods of the class are all available for use.
  2. The constructor is not to be used, but the "public/API" methods of the class are.
  3. The class really is internal, has no public API, and I reserve the right to change or remove it in a future release.

I note that from a communications perspective, there is a conflict between clear expression of the external API (library end-user audience) and the internal API (developer audience). To an end user, calling _Slide() is verboten/at-own-risk. However it's a happy member of the internal API and distinct from _random_helper_method() which should only be called from within _Slide().

Do you have a point of view on this question that might help me? Does convention dictate that I use my "single-leading-underscore" ammunition on class names to fight for clarity in the end-user API or can I feel good about reserving it to communicate to myself and other developers about when a class API is really private and not to be used, for example, by objects outside the module it lives in because it is an implementation detail that might change?


UPDATE: After a few years of further reflection, I've settled into the convention of using a leading underscore when naming classes that are not intended to be accessed as a class outside their module (file). Such access is typically to instantiate an object of that class or to access a class method, etc.

This provides users of the module (often yourself of course :) the basic indicator: "If a class name with a leading underscore appears in an import statement, you're doing something wrong." (Unit test modules are an exception to this rule, such imports may often appear in the unit tests for an "internal" class.)

Note that this is access to the class. Access to an object of that class (type) outside the module, perhaps provided by a factory or whatever, is perfectly fine and perhaps expected. I think this failure to distinguish classes from the objects created from them was what led to my initial confusion.

This convention also has the side benefit of not including these classes when making a from module import * statement. Although I never use these in my own code and recommend avoiding them, it's an appropriate behavior because those class identifiers are not intended to be part of the module interface.

This is my personal "best-practice" after years of trial, and not of course the "right" way. Your mileage may vary.

Upvotes: 22

Views: 12874

Answers (4)

martineau
martineau

Reputation: 123491

I couldn't tell from just the description in your question, but from the additional information you provided in a comment, I think your Slide class is actually public.

This is true despite the fact that instances will only be created indirectly by calling the add_slide() method of a Presentation because the caller will then be free (and more likely required) to call the instance's methods to manipulate it afterward. In my opinion a truly private class would only be accessed by the methods of the class that "owns" it.

Letting things be any other way both breaks encapsulation and increases the coupling between components of your design, both of which are undesirable and should be avoided as much as possible for flexibility and reusability.

Upvotes: 5

SingleNegationElimination
SingleNegationElimination

Reputation: 156238

I think martineau's answer is a good one, certainly the simplest and no doubt most pythonic.

It is, though, by no means the only option.

A technique used frequently is to define the public methods as part of an interface type; for instance zope.interface.Interface is used widely in the twisted framework. Modern python would probably use abc.ABCMeta for the same effect. Essentially, the public method Presentation.add_slide is documented as returning some instance of AbstractSlide, which has more public methods; but since it's not possible to construct an instance of AbstractSlide directly, there's no public way to create one.

That technique can be particularly handy, since mock instances of the abstract type can be created that can assert that only the public methods are called; quite useful for unit testing (especially for the users of your library, so that they can make sure they're using only the public interfaces).

Another option; since the only publicly available way to create instances of the slide class is through Presentation.add_slide; you could make that literally be the constructor. Such would probably require some metaclass shenanegans, something like this should do.

from functools import partial

class InstanceConstructor(type):
    def __get__(cls, instance, owner):
        if instance is not None:
            return partial(cls, instance)
        return cls

And just define the add_slide behavior in the __init__ of the class:

>>> class Presentation(object):
...     def __init__(self):
...         self.slides = []
...
...     class add_slide(object):
...         __metaclass__ = InstanceConstructor
...
...         def __init__(slide, presentation, color):
...             slide.color = color
...             presentation.slides.append(slide)
...
>>> p = Presentation()
>>> p.add_slide('red')
<instance_ctor.add_slide object at ...>
>>> p.slides
[<instance_ctor.add_slide object at ...>]
>>> p.slides[0].color
'red'

Upvotes: 4

Mark R. Wilkins
Mark R. Wilkins

Reputation: 1302

What is an "internal" class is a matter of convention, which is why the PEP didn't define the term, but if you're exposing these to your users (providing methods that return this class, or rely on having this class passed around) it's not really "internal" at all.

You have three options, really. First might be documenting and making clear your intention for how this class be constructed, in other words allowing your users to construct and use instances of the class, and documenting how to do this. Since you want your higher-level class tracking these Slide objects, make that something the constructor interface mandates and sets up, so that users of your module can make these just like you.

Another would be to wrap those aspects of this class that you want to expose in the class that creates and manages these instances, in other words making this a truly internal class. If you only need to expose one method (say, add_shape()) this might be very reasonable. If you want a large chunk of the interface to this class to be exposed, though, that starts to look clumsy.

Finally, you could just clearly document that users shouldn't make instances of this class themselves, but have access to parts of the interface. This is confusing but might be an option.

I like the first choice, myself: expose the class to the user of your module and use its constructor and destructor implementation to ensure that it's always connected in the right way to other classes that need to know about it. Then, users can do things like subclass this thing and wrap methods they'd like to be calling anyway, and you make sure your objects stay connected.

Upvotes: 2

eri
eri

Reputation: 3514

__all__ = ['Presentation'] in module hides all classes from functions like help() except Presentation class. But developers still have access to call constructors and methods from outside.

Upvotes: 0

Related Questions