Jason Hite
Jason Hite

Reputation: 621

Custom text representation for sum type

I'm having trouble grappling with how to turn a sum type into a string without doing a whole bunch of pattern matching. A [simplified] example might be something like this:

data Shape = Rectangle Int Float Float |
             Circle Int Float |
             Ellipse Int Float Float
             deriving (Show, Eq) 

What I want is a function renderShape :: Shape -> String that takes a Shape and gives me a string that represents the argument in a certain way (I'm generating lines in an input file for a simulation program). Note that I would like this to be distinct from show, as it is doing a bit of extra formatting beyond just turning the data structure into a string. What I actually need it to do is kind of complicated, but for example purposes (because if you can explain generally how to do this I think I can work out the rest), let's just say I want to stick a semicolon on the end and separate with a comma e.g.

renderShape $ Rectangle 1 2 2
>> "Rectangle,1,2,2;"

Specifically, what I'm struggling with is that the different constructors for the type have varying number of arguments. I could do something with pattern matching, like

renderShape (Rectangle i x y) = (intercalate "," ["Rectangle", show i, show x, show y]) ++ ";"
renderShape (Circle i x) = ...
...

but I'd prefer not to as in reality I have a ton of these types of shapes so it'd be awfully tedious.

I've found something that "works" by abusing the derived show instance and simply transforming the string, but it seems really ugly:

{-# LANGUAGE OverloadedStrings #-}
import Data.List (intercalate)

renderShape :: Shape -> String
renderShape s = intercalate "," theWords ++ ";"
  where theWords = (words . show) s

So my question is, what's the "nice" way to do this? It seems to me like there has to be a simple and clean way to go about it, but I can't figure it out for the life of me. I'm also not hugely experienced with Haskell, so if my whole approach is fundamentally wrong or non-idiomatic then I'd welcome alternatives.

Upvotes: 1

Views: 262

Answers (2)

hao
hao

Reputation: 10228

This is an ideal usage of generics. Our strategy will be to convert the datatype into its generic representation (from :: (Generic a) => a -> Rep a) and then recurse into Rep a. Rep a is actually a type function, so let's see what it looks like:

λ> :info Shape
[lots of garbage]
type instance Rep Shape
  = D1
      Main.D1Shape
      (C1
         Main.C1_0Shape
         (S1 NoSelector (Rec0 Int)
          :*: (S1 NoSelector (Rec0 Float) :*: S1 NoSelector (Rec0 Float)))
       :+: (C1
              Main.C1_1Shape
              (S1 NoSelector (Rec0 Int) :*: S1 NoSelector (Rec0 Float))
            :+: C1
                  Main.C1_2Shape
                  (S1 NoSelector (Rec0 Int)
                   :*: (S1 NoSelector (Rec0 Float) :*: S1 NoSelector (Rec0 Float)))))

Oof. We're in higher-kinded territory here. To traverse this data structure we'll need to recurse into the type itself. Whereas value recursion works by repeatedly calling a function on smaller and smaller parts of a data structure, we will create a one-method class called Jason and then instantiate the method on smaller and smaller parts of the type.

{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE TypeOperators #-}

import GHC.Generics

data Shape = Rectangle Int Float Float |
             Circle Int Float |
             Ellipse Int Float Float
             deriving (Show, Eq, Generic)

class Jason a where
  jasonShow :: a -> String

-- Integers can use the regular show.
instance Jason Float where jasonShow = show
-- So can floats.
instance Jason Int where jasonShow = show

-- Constant values are easy.
instance (Jason c) => Jason (K1 i c p) where jasonShow = jasonShow . unK1

-- Use generics to pattern match into constructors.
instance (Jason (f p), Constructor c) => Jason (M1 C c f p) where
  jasonShow constructor@(M1 x) = conName constructor ++ "," ++ jasonShow x

-- We don't care about selectors.
instance (Jason (f p)) => Jason (M1 S c f p) where
  jasonShow (M1 x) = jasonShow x

-- Or whether something is a datatype or not.
instance (Jason (f p)) => Jason (M1 D c f p) where
  jasonShow (M1 x) = jasonShow x

-- We don't care about the index into the disjoint union (a.k.a. "|") of a datatype.
instance (Jason (f p), Jason (g p)) => Jason ((f :+: g) p) where
  jasonShow (L1 x) = jasonShow x
  jasonShow (R1 x) = jasonShow x

-- We want to insert a comma when we encounter a product in the datatype.
instance (Jason (f p), Jason (g p)) => Jason ((f :*: g) p) where
  jasonShow (x :*: y) = jasonShow x ++ "," ++ jasonShow y

renderShape :: Shape -> String
renderShape = (++ ";") . jasonShow . from

To output commas for products and the name of the constructor, we special cased those instances:

  • M1 C c f p is a type for the meta-information (M) for a constructor (C) wrapped around a value f p. Calling conName on a value of this type will helpfully return a string representing the name of the constructor.

  • (f :*: g) p is a type for the product of two values of type f p and g p. Earlier, when we queried for the type of our rectangle, you can see this appear in a couple of places. For example M1 S NoSelector (Rec0 Float) :*: M1 S NoSelector (Rec0 Float) (I expanded out the type S1 = M1 S alias). You can think of :*: as the glue between the numbers in Rectangle 1 2 2.

  • Our base cases in the recursion are Int and Float. We can keep defining more base cases, but our shape is just ints and floats.

This is pretty weird code! You don't see so so many esoteric datatypes with weird two-character names everyday. But we did arrive at this nice result:

λ> renderShape (Rectangle 1 2 2)
"Rectangle,1,2.0,2.0;"
λ> renderShape (Circle 1 2)
"Circle,1,2.0;"
λ> renderShape (Ellipse 1 2 2)
"Ellipse,1,2.0,2.0;"

It's also a lot of code. But it's generic code, which means you can reuse it with other, non-Shape datatypes.

on language extensions

  • DeriveGeneric: allows derive (Generic), which automatically instances Generic for our datatype Shape.

  • FlexibleContexts: without this, we can't say instance (Jason (f p), Constructor c) => Jason (M1 C c f p). The Haskell spec disallows constraints like Jason (f p). Jason p is OK, but Jason (f p) is no go. Fortunately, GHC is flexible.

  • FlexibleInstances: allows destruct M1 i c f p into three different instances: M1 C c f p, M1 D c f p, and M1 S c f p. Normally disallowed by the spec.

  • TypeOperators: allows the usage of infix type functions :*: and :+:.

see

Upvotes: 3

amalloy
amalloy

Reputation: 91867

You could derive Data.Data for your ADT, which exposes a generic descriptor for any custom data type. Then you can write a function that renders any instance of Data in the not-quite-like-Show fashion you need. It's certainly overkill for a single smallish type like Shape, but you could reuse it for multiple, larger types.

I'm not totally familiar with the API for doing so, but http://chrisdone.com/posts/data-typeable ends with an example of writing a generic function with a signature close to gshow :: Data d => d -> String.

Upvotes: 2

Related Questions