ilovewt
ilovewt

Reputation: 1023

Can we provide an example and motivation on when to use covariant and contravariant?

I have a background in machine/deep learning but I aspire to be a good software engineer as well.

I have some troubles finding real use cases of covariant/contravariant (partly because this is a new concept for me and the initial learning curve is difficult).

I would like a concrete motivation and example on when covariant/contravariant is used, in particular, I would appreciate the example to be such that if covariant/contravariant is not applied, then the application would be buggy/not type safe.

To start, I know PyTorch's Dataset and DataLoader is parametrized by a covariant type:

class Dataset(Generic[T_co]):
    r"""An abstract class representing a :class:`Dataset`.

    All datasets that represent a map from keys to data samples should subclass
    it. All subclasses should overwrite :meth:`__getitem__`, supporting fetching a
    data sample for a given key. Subclasses could also optionally overwrite
    :meth:`__len__`, which is expected to return the size of the dataset by many
    :class:`~torch.utils.data.Sampler` implementations and the default options
    of :class:`~torch.utils.data.DataLoader`. Subclasses could also
    optionally implement :meth:`__getitems__`, for speedup batched samples
    loading. This method accepts list of indices of samples of batch and returns
    list of samples.

    .. note::
      :class:`~torch.utils.data.DataLoader` by default constructs a index
      sampler that yields integral indices.  To make it work with a map-style
      dataset with non-integral indices/keys, a custom sampler must be provided.
    """

    def __getitem__(self, index) -> T_co:
        raise NotImplementedError("Subclasses of Dataset should implement __getitem__.")

    # def __getitems__(self, indices: List) -> List[T_co]:
    # Not implemented to prevent false-positives in fetcher check in
    # torch.utils.data._utils.fetch._MapDatasetFetcher

    def __add__(self, other: 'Dataset[T_co]') -> 'ConcatDataset[T_co]':
        return ConcatDataset([self, other])

    # No `def __len__(self)` default?
    # See NOTE [ Lack of Default `__len__` in Python Abstract Base Classes ]
    # in pytorch/torch/utils/data/sampler.py

I wonder if someone can come up with convincing example of why a Dataset needs to be covariant.

Upvotes: 0

Views: 150

Answers (1)

chepner
chepner

Reputation: 532003

The short answer is that you want DataSet[bool] to be a subtype of DataSet[int], because bool is a subtype of int. By default, generic types are invariant, because the assumption is that the type will be mutable.

Compare tuples and lists. Tuples are covariant, because you can't modify them. The only thing you can do is read from them, so if you need a tuple[int] value, a tuple of any subclass of int will do. Lists, on the other hand, are invariant, because you don't necessarily know if you will be reading from the list or writing to the list. (If reading ints, you can take a list[bool] in place of list[int]; if writing bools, you can take a list[int] instead a list[bool]. But in general, neither is substitutable for the other.)

Contravariance doesn't really come up with traditional containers, but does when talking about function types. A function f is substitutable for a function g if f accepts all the same arguments (but possibly more) that g does, and returns some of the same arguments (but possibly fewer) than g could. We describe this as saying that function types are contravariant in their argument(s) and covariant in their return types.

Upvotes: 0

Related Questions