What is the best way of threading values around between different processes in Haskell

Question

Something I have been thinking about recently, when working with a larger Haskell system, how do you thread values around between processes in an elegant way?

I have been learning more and more about the relationship between function input and output, laws like injectivity, surjectivivity, non-injective functions and so on... a realisation I have had is that most functions in a realworld application, that are processing data from a business logic point of view are reducing values down to smaller and smaller values until you effectively end up at some primative type (typically Bool) certainly for conditional style business logic. Or perhaps some other value on a field in your DB that you want to update.

So lets look at an example of what I might mean. Lets say we have a system with a table of Users, a table of Products, and maybe a table of Offers. A user can basket a product, and a product may or may not have an offer you could apply to it. Here I will illustrate a pure function which determines that an offer is applicable to a product, and another that calculates a discount for that product:

-- lets also assume these types are Persistent data types
data User = User
  { userName :: Text
  , userAge :: Int
  , userHasDiscounts :: Bool
  , userAccountDisabled :: Bool
  }

data Product = Product
  { productName :: Text
  , productCost :: Money
  }

data Offer = Offer
  { offerName :: Text
  , offerDiscount :: Discount
  , offerProduct :: ProductId
  }

productDiscountable :: Entity User -> Entity Product -> Entity Offer -> Bool
productDiscountable (Entity _ user) (Entity productKey _) (Entity _ offer) =
  not (userAccountDisabled user) && userHasDiscounts user && productKey == offerProduct offer

productWithDiscountApplied :: Entity User -> Entity Product -> Entity Offer -> Product
productWithDiscountApplied user (Entity productKey product) offer = product { productCost = discountedCost }
  where
    discountedCost = 
      if not (productDiscountable user (Entity productKey product) offer)
        then productCost product
        else applyDiscount (offerDiscount offer) (productCost product)

applyDiscount :: Discount -> Money -> Money
applyDiscount = undefined

This might not be an amazing example, and my question is more about how things scale across a broader system... but note in the example for some functions we are discarding data we receive as input, in productDiscountable we throw away the Product and just take the entity key, and throw away the entity key for User and Offer just to use the inner values. The same is the case in productWithDiscountApplied where we are taking in some values we do not strictly need. What if we wanted to calculate a discount with an offer that has not yet been saved to our DB for example?

It seems like the types would make more sense to be as small as they need to be:

productDiscountable :: User -> Key Product -> Offer -> Bool
productDiscountable user productKey offer =
  not (userAccountDisabled user) && userHasDiscounts user && productKey == offerProduct offer

productWithDiscountApplied :: User -> Entity Product -> Offer -> Product
productWithDiscountApplied user (Entity productKey product) offer = product { productCost = discountedCost }
  where
    discountedCost =
      if not (productDiscountable user productKey offer)
        then productCost product
        else applyDiscount (offerDiscount offer) (productCost product)

This seems much cleaner to me and is typically how I write Haskell currently. The issue I have here is that in a real world system where all of these peices have to fit together different parts of the system have different data requirements, and some parts may require reading from lots of different tables. In the system I work on, we have functions like the above, only with 5/6 different entities in some cases. In these cases, I sometimes define a larger type to contain all of the different pieces. So something like:

data SomeProcessType = SomeProcessType
  { sptUser :: Entity User
  , sptProduct :: Entity Product
  , sptOffer :: Entity Offer
  -- possibly more types contained within this type
  }

But much like the reasoning behind narrowing the types with the Entity type, there is reason we may want to narrow this type as well, lets say we want to bundle the arguments for productDiscountable into a type like this:

data ProductDiscountableParams = ProductDiscountableParams
  { productDiscountableParamsUser :: User
  , productDiscountableParamsProduct :: Key Product
  , productDiscountableParamsOffer :: Offer
  }

And you can map from the larger type to this type with some function:

productDiscountable $ ProductDiscountableParams 
  { productDiscountableParamsUser = entityVal (sptUser spt)
  , productDiscountableParamsProduct = entityKey (sptProduct spt)
  , productDiscountableParamsOffer = entityVal (sptOffer spt)
  }

But then this leads me down a train of thought that ProductDiscountableParams should only contain values that it actually needs to calculate the result, much like the original function didnt need the keys from the Entity, there are also some fields from the types that are not required either. We dont access userAge from user, so why are we passing it? What if ProductDiscountableParams was defined with only the fields that it needs:

data ProductDiscountableParams = ProductDiscountableParams
  { productDiscountableParamsUserDisabled :: Bool
  , productDiscountableParamsUserHasDiscounts :: Bool
  , productDiscountableParamsOfferProductKey :: Key Product
  , productDiscountableParamsProductKey :: Key Product
  }

productDiscountable $ ProductDiscountableParams 
  { productDiscountableParamsUserDisabled = userDisabled (entityVal (sptUser spt))
  , productDiscountableParamsUserHasDiscounts = userHasDiscounts (entityVal (sptUser spt))
  , productDiscountableParamsOfferProductKey = offerProduct (entityVal (sptOffer spt))
  , productDiscountableParamsProductKey = entityKey (sptProduct spt)
  }

This seems nice from the point of view of the function, in that it is only taking fields that it requires. It is probably easier to write a test for in that it only needs some basic values. But it does not seem as safe (two pairs fields with the same types) and is not particularly nice to look at from the calling site point of view.

I have started to use this approach in some parts of my application but am in two minds about it.

The issue as well with having types like SomeProcessType that span multiple business processes say SomeOtherProcessType exists as well, with similar but not identical context, these two types need a way of getting to ProductDiscountableParams in order to call that, if they both need to call that function for whatever reason.

Another approach I have seen online is people using typeclasses a bit like you would an interface in an OO language. You might instead define ProductDiscountableParams as a typeclass:

class ProductDiscountableParams a where
  productDiscountableParamsUserDisabled :: a -> Bool
  productDiscountableParamsUserHasDiscounts :: a -> Bool
  productDiscountableParamsOfferProductKey :: a -> Key Product
  productDiscountableParamsProductKey :: a -> Key Product

You could define instances here much in the same way as the mapping described earlier:

instance ProductDiscountableParams SomeProcessType where
  productDiscountableParamsUserDisabled = userDisabled . entityVal . sptUser
  productDiscountableParamsUserHasDiscounts = userHasDiscounts . entityVal . sptUser
  productDiscountableParamsOfferProductKey = offerProduct . entityVal . sptOffer
  productDiscountableParamsProductKey = entityKey . sptProduct

Which makes it so that the calling site is much cleaner:

productDiscountable (spt :: SomeProcessType)

What are the best practices to stick to with regards the above? I feel like leaning on always trying to reduce the type down to its smallest representation is not always the way to go, but I also don't feel great about passing around the database entities either, and that there should be a nicer way to model problem input/output.

Jon Purdy · Accepted Answer

As you’ve observed, there’s often a tradeoff in business logic between giving a function a minimal domain to enforce correctness, and the convenience of actually calling that function on the restricted domain. Haskell record types are a bit unwieldy for this use case because you have to explicitly interoperate between them, and can’t say “I will only use such-and-such fields” in an ad-hoc fashion.

One thing I do in this circumstance is separating the two concerns by having an implementation function that takes only the exact information it needs, and a separate interface function that accepts a larger type and extracts those fields to pass to the actual logic.

productDiscountable :: SomeProcessType -> Bool
productDiscountable spt = productDiscountable u p o
  where
    -- Extract relevant info from larger types.
    u = entityVal (sptUser spt)
    p = entityKey (sptProduct spt)
    o = entityVal (sptOffer spt)

productDiscountable' :: User -> Key Product -> Offer -> Bool
productDiscountable' u p o = …  -- Use restricted types for business logic.

And you can still make a separate type like ProductDiscountableParams for the restricted fields if you want.

I wouldn’t generally reach for typeclasses for this purpose, because all they provide here is overloading rather than laws, but if you want to go down that route because you have several distinct types that fulfill this interface, this is a natural place to use lenses.

{-# LANGUAGE InstanceSigs #-}

import Control.Lens (Lens')  -- Use whichever lens library you prefer
import Data.Functor ((<&>))  -- Flipped <$>, convenient for defining lenses

class HasProductDiscountableParams a where
  productDiscountableParams :: Lens' a (User, Key Product, Offer)
  -- or: Lens' a ProductDiscountableParams

instance HasProductDiscountableParams SomeProcessType where

  productDiscountableParams :: Lens' SomeProcessType (User, Key Product, Offer)
  -- ===
  --   :: (Functor f)
  --   => ((User, Key Product, Offer) -> f (User, Key Product, Offer))
  --   -> SomeProcessType -> f SomeProcessType

  productDiscountableParams f
    spt { sptUser = u, sptProduct = p, sptOffer = o }

    -- Extract relevant fields
    = f (u, p, o)

    -- Reconstitute record
    <&> \ (u', p', o') -> spt { sptUser = u { … = u' … }, … }

Then you can use view productDiscountableParams spt to extract only those portions you’re interested in, or over / set to update them. (I’ve left some “holes” here because I’m not familiar with Persistent, so I hope it’s clear nevertheless.)

Another option that may be adaptable for your use case is the higher-kinded data pattern (HKD), where the same record is reused with different internal “wrapper” types, matching on Identity to remove wrapping entirely:

{-# LANGUAGE TypeFamilies #-}

import Data.Functor.Identity (Identity)

type family HKD f a where
  HKD Identity a =   a
  HKD f        a = f a

data SomeProcessTypeF f = SomeProcessTypeF
  { sptUser    :: HKD f User
  , sptProduct :: HKD f Product
  , sptOffer   :: HKD f Offer
  …
  }

Then you can instantiate it at different types to recover the wrapped and unwrapped versions:

type SomeProcessType = SomeProcessTypeF Entity

type ProductDiscountableParams = SomeProcessTypeF Identity

You may need to munge your types into a slightly different shape to make it fit this pattern, or use another type family to handle the conversions.

What is the best way of threading values around between different processes in Haskell

Answers (1)

Related Questions