Pontus Hultkrantz
Pontus Hultkrantz

Reputation: 480

Python type hinting for generic container constructor

What is the correct typing to use for the below marked in ???, where we cast a generic iterable data container type to an iterable container of different type?

def foo(itr:Iterable, cast_type:???) -> ???:  (For Py 3)
    # type: (Iterable[Any], ???) -> ???    (For Py 2.7)
    return cast_type(itr)

foo([1,2], cast_type=set) # Example 1
foo(set([1,2]), cast_type=list) # Example 2 ...

Upvotes: 3

Views: 1294

Answers (2)

Daniil Fajnberg
Daniil Fajnberg

Reputation: 18663

No parameterized type variables!

The problem is that so far the Python typing system does not allow higher-kinded variables, meaning type variables that are parameterized with yet another type variable. This would be helpful here, since we could define a type variable T annotate itr with Iterable[T], then define for example It as a type variable bounded by Iterable[T] and annotate cast_type as type[It[T]], and finally annotate the return type as It[T].

Alas, this is not possible yet (but in the making it seems), so we need to work around that.

No safe __init__ signature

The next problem is that there is no common constructor interface in the collections ABCs allowing an argument to be passed. We might be tempted to do the following:

from collections.abc import Iterable
from typing import Any, TypeVar

It = TypeVar("It", bound=Iterable[Any])

def foo(itr: Iterable[Any], cast_type: type[It]) -> It:
    return cast_type(itr)

But the problem is that mypy will correctly give us the error Too many arguments for "Iterable" [call-arg] for that last line. This problem remains the same, no matter which one of the abstract base classes we pick (like Collection or Set or what have you).

Extend the protocol

(EDIT: See the last section for why the __init__ protocol solution may cause problems and why it may be better to use Callable instead of type.)

To avoid this issue, we can introduce our own custom protocol that inherits from Iterable and also introduces a common __init__ interface:

from collections.abc import Iterable
from typing import Any, Protocol, TypeVar


class ConstructIterable(Iterable[Any], Protocol):
    def __init__(self, arg: Iterable[Any]) -> None: ...


It = TypeVar("It", bound=ConstructIterable)


def foo(itr: Iterable[Any], cast_type: type[It]) -> It:
    return cast_type(itr)


a = foo([1, 2], cast_type=set)
b = foo({1, 2}, cast_type=list)
reveal_type(a)
reveal_type(b)

Those last two lines are for mypy and if we run that in --strict mode over this script, we get no errors and the following info:

note: Revealed type is "builtins.set[Any]"
note: Revealed type is "builtins.list[Any]"

So far so good, the container types are inferred correctly from our argument types.

Preserve the element types selectively

But we are losing the element type information this way. It is currently just Any. As I said, there is no good way to solve this right now, but we can work around that, if we want to decide on the most common use cases for that function.

If we anticipate it being called most often with common container types like list, tuple and set for example, we can write overloads specifically for those and leave a catch-all non-generic Iterable case as a fallback. That last case will then still drop the element type information, but at least the other signatures will preserve it.

Here is an example:

from collections.abc import Iterable
from typing import Any, Protocol, TypeVar, overload


class ConstructIterable(Iterable[Any], Protocol):
    def __init__(self, _arg: Iterable[Any]) -> None: ...


T = TypeVar("T")
It = TypeVar("It", bound=ConstructIterable)


@overload
def foo(itr: Iterable[T], cast_type: type[list[Any]]) -> list[T]:
    ...

@overload
def foo(itr: Iterable[T], cast_type: type[tuple[Any, ...]]) -> tuple[T, ...]:
    ...

@overload
def foo(itr: Iterable[T], cast_type: type[set[Any]]) -> set[T]:
    ...

@overload
def foo(itr: Iterable[T], cast_type: type[It]) -> It:
    ...

def foo(
    itr: Iterable[Any],
    cast_type: type[ConstructIterable],
) -> ConstructIterable:
    return cast_type(itr)

Test it again with mypy:

...

a = foo([1, 2], cast_type=set)
b = foo({1., 2.}, cast_type=list)
c = foo([1, 2], cast_type=tuple)
d = foo({1., 2.}, cast_type=frozenset)
reveal_type(a)
reveal_type(b)
reveal_type(c)
reveal_type(d)

Output:

note: Revealed type is "builtins.set[builtins.int]"
note: Revealed type is "builtins.list[builtins.float]"
note: Revealed type is "builtins.tuple[builtins.int, ...]"
note: Revealed type is "builtins.frozenset[Any]"

As you can see, at least the first three cases correctly preserved the element types. I am afraid this not-so-elegant solution is as good as it gets with the current type system limitations.


PS

It seemed as though your question included one about Python 2, but I don't think that merits a response. Nobody should be using Python 2 today. Not to mention the typing system was essentially non-existent back then. The solution I showed above requires 3.9, but can probably be made compatible with slightly older versions of Python by using typing_extensions, as well as the deprecated typing.List, typing.Tuple, and such.


EDIT

Thanks to @joel for pointing out that __init__ signatures are not checked, when passing subclasses. Instead of using type, it might be safer to go with the supertype Callable, then specify the argument and return type accordingly. This also makes the custom protocol unnecessary.

The adjusted workaround solution would then look like this:

from collections.abc import Callable, Iterable
from typing import Any, TypeVar, overload


T = TypeVar("T")
It = TypeVar("It", bound=Iterable[Any])


@overload
def foo(itr: Iterable[T], cast_type: type[list[Any]]) -> list[T]:
    ...


@overload
def foo(itr: Iterable[T], cast_type: type[tuple[Any, ...]]) -> tuple[T, ...]:
    ...


@overload
def foo(itr: Iterable[T], cast_type: type[set[Any]]) -> set[T]:
    ...


@overload
def foo(itr: Iterable[T], cast_type: Callable[[Iterable[Any]], It]) -> It:
    ...


def foo(
    itr: Iterable[Any],
    cast_type: Callable[[Iterable[Any]], It],
) -> It:
    return cast_type(itr)

The mypy output for out test lines is essentially the same, but for the last call revealing "builtins.frozenset[_T_co1]"` instead, which amounts to the same thing.

Upvotes: 4

joel
joel

Reputation: 7877

I'm not convinced this is possible in python. The types would be something like

T = TypeVar("T")

F = TypeVar("F", bound=Iterable)
G = TypeVar("G")

def foo(itr: F[T], cast_type: Callable[[F[T]], G[T]]) -> G[T]:
    return cast_type(itr)

but typevars F and G are parametrized by other types, which Python doesn't support.

Upvotes: 3

Related Questions