Richard
Richard

Reputation: 3414

Confusion with using Mypy with class inheritance - List vs Sequence

Please excuse my confusion - I'm new to using typing and trying to use it along with mypy for checking. It looks like the problem/question I have seems to happen to people starting to use typing and Mypy quite a lot.

Problem

I'm trying to define an abstract composition of dataclasses, that will be subclassed into concrete classes to add additional data.

So in a simplified form I'm trying to do the following:

from dataclasses import dataclass
from typing import List

@dataclass
class TestResultImage:
    base_var_a: int 


@dataclass
class TestSeries:
    imgs: List[TestResultImage]

# --- concrete instances -------

@dataclass
class SpecificImageType1(TestResultImage):
    specific_var_b: float
    specific_var_c: int 


@dataclass
class SpecificSeries(TestSeries):
    imgs: List[SpecificImageType1]

Mypy fails on the above with the error\

error: Incompatible types in assignment (expression has type "List[SpecificImageType1]", base class "TestSeries" defined the type as "List[TestResultImage]")
note: "List" is invariant -- see http://mypy.readthedocs.io/en/latest/common_issues.html#variance
note: Consider using "Sequence" instead, which is covariant

Fix

Changing {List} to {Sequence} solves the problem - as noted in the error.

Question

I have seen quite a few SO and Mypy git issues related to this issue and the confusion of people.

So I then went and attempted to read as many of the Mypy docs as possible.

But it's still - IMHO - pretty confusing as to why List is problematic when you're subclassing. ...or perhaps confused why 'List is invariant, but Sequence is covariant'.

So I'm asking, perhaps on behalf of others like me trying to really use typing, and so Mypy, for more than trivial examples - is there any good explanations of the reason List is problematic, and some examples anywhere?

Upvotes: 2

Views: 1552

Answers (1)

Michael0x2a
Michael0x2a

Reputation: 64188

Suppose we add the following to your original code:

def check_specific_images(imgs: List[SpecificImageType1]) -> None:
    for img in imgs:
        print(img.specific_var_b)

def modify_series(series: TestSeries) -> None:
    series.append(TestResultImage(1))

specific = SpecificTestSeries(imgs=[
    SpecificImageType1(1, 2.0, 3),
    SpecificImageType1(4, 5.0, 6),
])

modify_series(specific)
check_specific_images(specific.imgs)

This program on the surface ought to type check: specific is an instance of TestSeries so it's legal to do modify_series(specific). Similarly, specific.imgs is of type List[SpecificImageType1] so doing check_specific_images(specific.imgs) is also legal.

However, if we actually try running this program, we'll get a runtime error when we call check_specific_images! The modify_series added a TestResultImage object to our List[SpecificImageType1] causing the subsequent call to check_specific_images crash at runtime.

This problem is fundamentally why mypy (or pretty much any other sane type system) will not let List[SpecificImageType1] be treated as a subtype of List[TestResultImage]. In order for one type to be a valid subtype of another, it should be possible to safely use the subtype in any location that expects the parent type. This is simply not true for lists.

Why? Because lists support write operations. It should always be safe to insert a TestResultImage (or any subtype of TestResultImage) into a List[TestResultImage], and this is not true for List[SpecificImageType1].


So if the problem is that lists are mutable, what if we instead use a type that's immutable instead -- supports only read operations? This would let us side-step the problem entirely.

This is exactly what Sequence is: it's a type that contains all of the read-only methods that lists support (and is a supertype of List).


More broadly, let's suppose we have some sort of generic type Wrapper[T] along with two classes Parent and Child where Child is a subtype of Parent.

This then raises the question: how does Wrapper[Parent] relate to Wrapper[Child]?

There are four possible answers to this:

  • Wrapper is covariant: Wrapper[Child] is a subtype of Wrapper[Parent].

  • Wrapper is contravariant: Wrapper[Parent] is a subtype of Wrapper[Child].

  • Wrapper is invariant: Wrapper[Parent] and Wrapper[Child] are unrelated to each other and neither is a subtype of the other.

  • Wrapper is bivariant: Wrapper[Parent] is a subtype of Wrapper[Child] and Wrapper[Child] is a subtype of Wrapper[Parent].

When you're defining Wrapper[T], mypy will let you pick whether you want that type to be covariant, contravariant, or invariant. Once you've made your choice, mypy will then enforce the following rules:

  1. If a class is covariant, it can only support read operations against T. In practice, this means you're disallowed from defining methods that accept anything of type T.
  2. If a class is contravariant, it can only support write operations against T. In practice, this means you're disallowed from defining methods that return anything of type T.
  3. If a class is invariant, it can support both read and write operations against T. There are no restrictions on what types of methods you can define.

Mypy doesn't let you create bivariant types: the only time when such a type would be safe is if it supported neither read nor write operations against T -- which would be pretty pointless.

You usually only see bivariant types in programming languages/type systems that intentionally want to make generics as simple as possible, even if it means letting the user introduce bugs like the one shown above into their program.

The high-level intuition here is that supporting either read operations or write operations against T will place constraints on how Wrapper[Parent] is related to Wrapper[Child] -- and if you support both kinds of operations, the combined constraints will end up making the two types be simply unrelated.

Upvotes: 6

Related Questions