Santiago Cuellar
Santiago Cuellar

Reputation: 405

Understanding the implementation of Data.Word

I'm trying to use Data.Word but I don't even understand its source code. Here are some precise questions I have, but if you have a better resource for using Word or similar libraries, that might be helpful too.

For reference, let's look at the implementation of Word8 (source)

data {-# CTYPE "HsWord8" #-} Word8 = W8# Word#
  1. What are the #s? As far as I can tell it's not part of the name or a regular function.
  2. What is the declaration before Word8 ({-# CTYPE "HsWord8" #-})? I have seen those as language declarations at the begining of files but never in a definition.
  3. As far as I can tell W8 or W8# (I don't even know how to parse it) is not defined anywhere else on the file or imported. Is it being implicitly defined here or em I missing something?
  4. Similarly Word# is used in all definitions of Word, but I don't see it defined anywhere... where is it coming from and how can I see its definition?

Upvotes: 1

Views: 237

Answers (2)

K. A. Buhr
K. A. Buhr

Reputation: 51029

Well, @DanielWagner covered most of this, but I was just about done writing this up, so maybe it'll provide some additional detail... I was originally confused about the nature of the "definitions" in GHC.Prim, so I've updated my answer with a correction.

You should be able to effectively use the Data.Word types without understanding the source code in GHC.Word.

Data.Word just provides a family of unsigned integral types of fixed bitsize (Word8, Word16, Word32, and Word64) plus a Word type of "default size" (same size as Int, so 64 bits on 64-bit architectures). Because these types all have Num and Integral instances, the usual operations on integers are available, and overflow is handled the usual way. If you want to use them as bit fields, then the facilities in Data.Bits will be helpful.

In particular, I don't see anything in the GHC.Word source that could possibly help you write "normal" code using these types.

That being said, the # character is not normally allowed in identifiers, but it can be permitted (only as a final character, so W# is okay but not bad#Identifier) by enabling the MagicHash extension. There is nothing special about such identifiers EXCEPT that specific identifiers are treated "magically" by the GHC compiler, and by convention these magic identifiers, plus some other identifiers that aren't actually "magic" but are intended for internal use only, use a final # character to mark them as special so they don't accidentally get used by someone who is trying to write "normal" Haskell code.

To illustrate, in the definition:

data {-# CTYPE "HsWord8" #-} Word8 = W8# Word#

the identifier W8# is not magic. It's just a regular constructor that's intended only for internal, or at least advanced, use. On the other hand, Word# is magic. It's internally defined by GHC as an "unboxed" unsigned integer (64 bits on 64-bit architectures) where "unboxed" here means that it's stored directly in memory in an 8-byte field without an extra field for its constructor.

You can find a nonsensical "definition", of sorts, in the source code for GHC.Prim:

data Word#

In normal Haskell code, this would define a data type Word# with no constructor. Such a data type would be "uninhabited", meaning it has no possible values. However, this definition isn't actually used. This GHC.Prim source code is automatically generated for the benefit of the Haddock documentation utility. Instead, GHC.Prim is a sort of virtual module, and its "real" implementation is build into the GHC compiler.

How do you know which identifiers ending in # are magic and which aren't? Well, you don't know just by looking at the names. I believe you can reliably tell by checking in GHCi if they are defined in the virtual GHC.Prim module:

> :set -XMagicHash
> import GHC.Prim
> :i Word#
data Word# :: TYPE 'GHC.Types.WordRep   -- Defined in ‘GHC.Prim’

Anything defined in GHC.Prim is magic, and anything else isn't. In the generated GHC.Prim source, these magic identifiers will show up with nonsense definitions like:

data Foo#

or:

bar# = bar#

Constructs of the form {-# WHATEVER #-} are compiler pragmas. They provide special instructions to the compiler that relate to the source file as a whole or, usually, to "nearby" Haskell code. Some pragmas are placed at the top of the source file to enable language extensions or set compiler flags:

{-# LANGUAGE FlexibleInstances #-}
{-# OPTIONS_GHC -Wall #-}

Others are interleaved with Haskell code to influence the compiler's optimizations:

double :: Int -> Int
{-# NOINLINE double #-}
double x = x + x

or to specify special memory layout or handling of data structures:

data MyStructure = MyS {-# UNPACK #-} !Bool {-# UNPACK #-} !Int

These pragmas are documented in the GHC manual. The CTYPE pragma is a rather obscure pragma that relates to how the Word type will be interpreted when used with the foreign function interface and the capi calling convention. If you aren't planning to call C functions from a Haskell program using the capi calling convention, you can ignore it.

Upvotes: 3

Daniel Wagner
Daniel Wagner

Reputation: 153102

What are the #s?

They are only marginally more special than the Ws, os, rs, and ds -- just part of the name. Standard Haskell doesn't allow this in a name, but it's just a syntactic extension (named MagicHash) -- nothing deep happening here. As a convention, GHC internals use # suffixes on types to indicate they are unboxed, and # suffixes on constructors to indicate they box up unboxed types, but these are just conventions, and not enforced by the compiler or anything like that.

What is the declaration before Word8 ({-# CTYPE "HsWord8" #-})?

CTYPE declares that, when using the foreign function interface to marshall this type to C, the appropriate C type to marshall it to is HsWord8 -- a type defined in the GHC runtime's C headers.

As far as I can tell W8 or W8# (I don't even know how to parse it) is not defined anywhere else on the file or imported. Is it being implicitly defined here?

Well, it is being defined there, but I wouldn't call it implicit; it's quite explicit! Consider this typical Haskell data declaration:

data Foo = Bar Field1 Field2

It defines two new names: Foo, a new type at the type level, and Bar, a new function at the computation level which takes values of type Field1 and Field2 and constructs a value of type Foo. Similarly,

data Word8 = W8# Word#

defines a new type Word8 and a new constructor function W8# :: Word# -> Word8.

Similarly Word# is used in all definitions of Word, but I don't see it defined anywhere... where is it coming from and how can I see it's definition.

Word# may be imported from GHC.Exts. You can discover this yourself via Hoogle. It is a compiler primitive, so while it is possible to look at its source, the thing you would be looking at would be metacode, not code: it would not be valid Haskell code declaring the type with a standard data declaration and listing constructors, but rather some combination of C code and Haskell code describing how to lay out bits in memory, emit assembly instructions for modifying it, and interact with the garbage collector.

Upvotes: 3

Related Questions