Suppose you have a large number of types and a large number of functions that each return "subsets" of these types. Let's use a small example to make the situation more explicit. Here's a simple algebraic data type: data T = A | B | C and there are two functions f , g that return a T f :: T g :: T For the situation at hand, assume it is important that f can only return a A or B and g can only return a B or C . I would like to encode this in the type system. Here are a few reasons/circumstances why this might be desirable: Let the functions f and g have a more informative signature than just ::T Enforce that implementations of f and g do not accidentally return a forbidden type that users of the implementation then accidentally use Allow code reuse, e.g. when helper functions are involved that only operate on subsets of type T Avoid boilerplate code (see below) Make refactoring (much!) easier One way to do this is to split up the algebraic datatype and wrap the individual types as needed: data A = A data B = B data C = C data Retf = RetfA A | RetfB B data Retg = RetgB B | RetgC C f :: Retf g :: Retg This works, and is easy to understand, but carries a lot of boilerplate for frequent unwrapping of the return types Retf and Retg . I don't see polymorphism being of any help, here. So, probably, this is a case for dependent types. It's not really a type-level list, rather a type-level set, but I've never seen a type-level set. The goal, in the end, is to encode the domain knowledge via the types, so that compile-time checks are available, without having excessive boilerplate. (The boilerplate gets really annoying when there are lots of types and lots of functions.)

haskelldependent-typealgebraic-data-types

Reputation: 2023

Subset algebraic data type, or type-level set, in Haskell

Suppose you have a large number of types and a large number of functions that each return "subsets" of these types.

Let's use a small example to make the situation more explicit. Here's a simple algebraic data type:

data T = A | B | C

and there are two functions f, g that return a T

f :: T
g :: T

For the situation at hand, assume it is important that f can only return a A or B and g can only return a B or C.

I would like to encode this in the type system. Here are a few reasons/circumstances why this might be desirable:

Let the functions f and g have a more informative signature than just ::T
Enforce that implementations of f and g do not accidentally return a forbidden type that users of the implementation then accidentally use
Allow code reuse, e.g. when helper functions are involved that only operate on subsets of type T
Avoid boilerplate code (see below)
Make refactoring (much!) easier

One way to do this is to split up the algebraic datatype and wrap the individual types as needed:

data A = A
data B = B
data C = C

data Retf = RetfA A | RetfB B 
data Retg = RetgB B | RetgC C

f :: Retf
g :: Retg

This works, and is easy to understand, but carries a lot of boilerplate for frequent unwrapping of the return types Retf and Retg.

I don't see polymorphism being of any help, here.

So, probably, this is a case for dependent types. It's not really a type-level list, rather a type-level set, but I've never seen a type-level set.

The goal, in the end, is to encode the domain knowledge via the types, so that compile-time checks are available, without having excessive boilerplate. (The boilerplate gets really annoying when there are lots of types and lots of functions.)

Upvotes: 5

Answers (3)

danidiaz

Reputation: 27771

Define an auxiliary sum type (to be used as a data kind) where each branch corresponds to a version of your main type:

{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE StandaloneKindSignatures #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE DataKinds #-}
import Data.Kind
import Data.Void
import GHC.TypeLits

data Version = AllEnabled | SomeDisabled

Then define a type family that maps the version and the constructor name (given as a type-level Symbol) to the type () if that branch is allowed, and to the empty type Void if it's disallowed.

type Enabled :: Version -> Symbol -> Type
type family Enabled v ctor where
    Enabled SomeDisabled "C" = Void
    Enabled _ _ = ()

Then define your type as follows:

type T :: Version -> Type
data T v = A !(Enabled v "A")
         | B !(Enabled v "B")
         | C !(Enabled v "C")

(The strictness annotations are there to help the exhaustivity checker.)

Typeclass instances can be derived, but separately for each version:

deriving instance Show (T AllEnabled)
deriving instance Eq (T AllEnabled)
deriving instance Show (T SomeDisabled)
deriving instance Eq (T SomeDisabled)

Here's an example of use:

noC :: T SomeDisabled
noC = A ()

main :: IO ()
main = print $ case noC of
    A _ -> "A"
    B _ -> "B"
    -- this doesn't give a warning with -Wincomplete-patterns

This solution makes pattern-matching and construction more cumbersome, because those () are always there.

A variation is to have one type family per branch (as in Trees that Grow) instead of a two-parameter type family.

Upvotes: 3

leftaroundabout

Reputation: 120711

Giving each individual value its own type scales extremely badly, and is quite unnecessarily fine-grained.

What you probably want is just restrict the types by some property on their values. In e.g. Coq, that would be a subset type:

Inductive T: Type :=
     | A
     | B
     | C.

Definition Retf: Type := { x: T | x<>C }.
Definition Retg: Type := { x: T | x<>A }.

Well, Haskell has no way of expressing such value constraints, but that doesn't stop you from creating types that conceptually fulfill them. Just use newtypes:

newtype Retf = Retf { getRetf :: T }
mkRetf :: T -> Maybe Retf
mkRetf C = Nothing
mkRetf x = Retf x

newtype Retg = Retg { getRetg :: T }
mkRetg :: ...

Then in the implementation of f, you match for the final result of mkRetf and raise an error if it's Nothing. That way, an implementation mistake that makes it give a C will unfortunately not give a compilation error, but at least a runtime error from within the function that's actually at fault, rather than somewhere further down the line.

An alternative that might be ideal for you is Liquid Haskell, which does support subset types. I can't say too much about it, but it's supposedly pretty good (and will in new GHC versions have direct support).

Upvotes: 1

chi

Reputation: 116139

I tried to achieve something like this in the past, but without much success -- I was not too satisfied with my solution.

Still, one can use GADTs to encode this constraint:

data TagA = IsA | NotA
data TagC = IsC | NotC
    
data T (ta :: TagA) (tc :: TagC) where
   A :: T 'IsA  'NotC
   B :: T 'NotA 'NotC
   C :: T 'NotA 'IsC

-- existential wrappers
data TnotC where TnotC :: T ta 'NotC -> TnotC
data TnotA where TnotA :: T 'NotA tc -> TnotA

f :: TnotC
g :: TnotA

This however gets boring fast, because of the wrapping/unwrapping of the exponentials. Consumer functions are more convenient since we can write

giveMeNotAnA :: T 'NotA tc -> Int

to require anything but an A. Producer functions instead need to use existentials.

In a type with many constructors, it also gets inconvenient since we have to use a GADT with many tags/parameters. Maybe this can be streamlined with some clever typeclass machinery.

Upvotes: 1

Subset algebraic data type, or type-level set, in Haskell

Answers (3)

Related Questions