How to design for extension using records

Question

I'd like to know how people from the Haskell community approach the following design. Assume a workflow-like system where you are piping some data (structure) through multiple steps in the system. As that data flows through the system, more and more data items are to be added to that structure that are not available in prior steps. Now I want to make sure that data items that are not available in a prior step are unaccessible - ideally via compile time checking.

So far, I came up with two different approaches.

Approach 1: recreate all types over and over again:

module Step1 where

    data A = A { item1 :: SomeType }

module Step2 where

    data B = B { item1 :: SomeType, item2 :: SomeOtherType }

    fromAtoB :: A -> B

module Step3 where

    data C = C { item1 :: SomeType, item2 :: SomeOtherType, item3 :: SomeOtherTypeAgain }

    fromBtoC :: B -> C

Obviously this becomes pretty burdensome the more steps there are and the deeper and wider the data types are defined.

Approach 2: compose the types:

module Step1 where

    data A = A { item1 :: SomeType }

module Step2 where

    data B = B { a :: A , item2 :: SomeOtherType }

    fromAtoB :: A -> B

module Step3 where

    data C = C { b :: B, item3 :: SomeOtherTypeAgain }

    fromBtoC :: B -> C

This approach has the problem that the user of any given step is suddenly exposed to all the steps before, as the access to some of the properties is different than to others (for instance, cInstance.b.a.Item1 vs. cInstance.Item1), een though for the user of any given step the data structure is naturally flat. Indeed he/she will not necessarily even know that there are steps before their own step. In an OO system, I would simply extend C from B and B from A.

Any ideas are greatly welcome.

Alec · Accepted Answer

If you want to avoid language extensions, the two solutions you propose are the way to go. For the variant where you get nesting, I'd recommend you {-# UNPACK #-} the nested data. That way you avoid indirection at runtime at least.

If you really want to use something subtyping-like, check out this solution I came up with a couple of days ago.

However, I think for this problem you would be better off following an approach commonly used for data that gets transformed stage to stage (GHC uses something similar for processing Haskell AST). Basically, you make a type familiy that "hides" fields until the right stage by given them type () until the right stage.

{-# LANGUAGE TypeFamilies, DataKinds #-}

data Stage = A | B | C

-- | A data type containing the final set of fields
data Complete (stage :: Stage) = Complete
  { item1 :: RestrictedUntilAfter A stage SomeType
  , item2 :: RestrictedUntilAfter B stage SomeOtherType
  , item3 :: RestrictedUntilAfter C stage SomeOtherTypeAgain
  }

-- | Compares the two given stages to determine if the result type should be hidden
-- as `()` or not
type family RestrictedUntilAfter (s1 :: Stage) (s2 :: Stage) x :: * where
  RestrictedUntilAfter B A _ = ()
  RestrictedUntilAfter C A _ = ()
  RestrictedUntilAfter C B _ = ()
  RestrictedUntilAfter _ _ t = t

Then, your types going through the pipeline are Complete A, Complete B, and Complete C. Fields that are restricted until a certain stage will have type () before that stage.

c1 = Complete { item1 = x, item2 = (), item3 = () } :: Complete A -- x :: SomeType
c2 = Complete { item1 = x, item2 = y, item3 = () } :: Complete B  -- y :: SomeOtherType
c3 = Complete { item1 = x, item2 = y, item3 = z } :: Complete C   -- z :: SomeOtherTypeAgain

(The type family might be better being open, or pattern-matched in a different order, but the idea is the same)

EDIT

As I suspected, there is a cleaner type family approach. In fact, with this approach, you don't even need to define any type families and it scales nicely in terms of LOC as you add both stages and fields. Finally, it is even more flexible. However, it does depend on type-list.

{-# LANGUAGE TypeFamilies, DataKinds, TypeOperators #-}

import Data.Type.List
import Data.Type.Bool

data Stage = A | B | C

type RestrictedTo stage validStages ty = If (Find stage validStages) ty ()

-- | A data type containing the final set of fields
data Complete (stage :: Stage) = Complete
  { item1 :: stage `RestrictedTo` [A,B,C] SomeType
  , item2 :: stage `RestrictedTo` [B,C]   SomeOtherType
  , item3 :: stage `RestrictedTo` [C]     SomeOtherTypeAgain
  }

Now, you can even have a field that is non-() only at stages A and C (but not B): item4 :: stage `RestrictedTo` [A,C] SomeOtherOtherType

How to design for extension using records

Answers (1)

EDIT

Related Questions