Reputation: 1023
I have a background in machine/deep learning but I aspire to be a good software engineer as well.
I have some troubles finding real use cases of covariant/contravariant (partly because this is a new concept for me and the initial learning curve is difficult).
I would like a concrete motivation and example on when covariant/contravariant is used, in particular, I would appreciate the example to be such that if covariant/contravariant is not applied, then the application would be buggy/not type safe.
To start, I know PyTorch's Dataset
and DataLoader
is parametrized by a covariant type:
class Dataset(Generic[T_co]):
r"""An abstract class representing a :class:`Dataset`.
All datasets that represent a map from keys to data samples should subclass
it. All subclasses should overwrite :meth:`__getitem__`, supporting fetching a
data sample for a given key. Subclasses could also optionally overwrite
:meth:`__len__`, which is expected to return the size of the dataset by many
:class:`~torch.utils.data.Sampler` implementations and the default options
of :class:`~torch.utils.data.DataLoader`. Subclasses could also
optionally implement :meth:`__getitems__`, for speedup batched samples
loading. This method accepts list of indices of samples of batch and returns
list of samples.
.. note::
:class:`~torch.utils.data.DataLoader` by default constructs a index
sampler that yields integral indices. To make it work with a map-style
dataset with non-integral indices/keys, a custom sampler must be provided.
"""
def __getitem__(self, index) -> T_co:
raise NotImplementedError("Subclasses of Dataset should implement __getitem__.")
# def __getitems__(self, indices: List) -> List[T_co]:
# Not implemented to prevent false-positives in fetcher check in
# torch.utils.data._utils.fetch._MapDatasetFetcher
def __add__(self, other: 'Dataset[T_co]') -> 'ConcatDataset[T_co]':
return ConcatDataset([self, other])
# No `def __len__(self)` default?
# See NOTE [ Lack of Default `__len__` in Python Abstract Base Classes ]
# in pytorch/torch/utils/data/sampler.py
I wonder if someone can come up with convincing example of why a Dataset
needs to be covariant.
Upvotes: 0
Views: 150
Reputation: 532003
The short answer is that you want DataSet[bool]
to be a subtype of DataSet[int]
, because bool
is a subtype of int
. By default, generic types are invariant, because the assumption is that the type will be mutable.
Compare tuples and lists. Tuples are covariant, because you can't modify them. The only thing you can do is read from them, so if you need a tuple[int]
value, a tuple of any subclass of int
will do. Lists, on the other hand, are invariant, because you don't necessarily know if you will be reading from the list or writing to the list. (If reading int
s, you can take a list[bool]
in place of list[int]
; if writing bool
s, you can take a list[int]
instead a list[bool]
. But in general, neither is substitutable for the other.)
Contravariance doesn't really come up with traditional containers, but does when talking about function types. A function f
is substitutable for a function g
if f
accepts all the same arguments (but possibly more) that g
does, and returns some of the same arguments (but possibly fewer) than g
could. We describe this as saying that function types are contravariant in their argument(s) and covariant in their return types.
Upvotes: 0