Haskell Access Tuple Data Inside List Comprehension

Question

I have defined a custom type as follows:

-- Atom reference number, x coordinate, y coordinate, z coordinate, element symbol, 
--      atom name, residue sequence number, amino acid abbreviation
type Atom = (Int, Double, Double, Double, Word8, ByteString, Int, ByteString)

I would like to gather all of the atoms with a certain residue sequence number nm.

This would be nice:

[x | x <- p, d == nm]
where
    (_, _, _, _, _, _, d, _) = x

where p is a list of atoms.

However, this does not work because I can not access the variable x outside of the list comprehension, nor can I think of a way to access a specific tuple value from inside the list comprehension.

Is there a tuple method I am missing, or should I be using a different data structure?

I know I could write a recursive function that unpacks and checks every tuple in the list p, but I am actually trying to use this nested inside an already recursive function, so I would rather not need to introduce that complexity.

ehird · Accepted Answer

This works:

[x | (_, _, _, _, _, _, d, _) <- p, d == nm]

However, you should really define your own data type here. A three-element tuple is suspicious; an eight-element tuple is very bad news indeed. Tuples are difficult to work with and less type-safe than data types (if you represent two different kinds of data with two tuples with the same element types, they can be used interchangeably). Here's how I'd write Atom as a record:

data Point3D = Point3D Double Double Double

data Atom = Atom
  { atomRef :: Int
  , atomPos :: Point3D
  , atomSymbol :: Word8
  , atomName :: ByteString
  , atomSeqNum :: Int
  , atomAcidAbbrev :: ByteString
  } deriving (Eq, Show)

(The "atom" prefix is to avoid clashing with the names of fields in other records.)

You can then write the list comprehension as follows:

[x | x <- p, atomSeqNum x == nm]

As a bonus, your definition of Atom becomes self-documenting, and you reap the benefits of increased type safety. Here's how you'd create an Atom using this definition:

myAtom = Atom
  { atomRef = ...
  , atomPos = ...
  , ... etc. ...
  }

By the way, it's probably a good idea to make some of the fields of these types strict, which can be done by putting an exclamation mark before the type of the field; this helps avoid space leaks from unevaluated thunks building up. For instance, since it doesn't make much sense to evaluate a Point3D without also evaluating all its components, I would instead define Point3D as:

data Point3D = Point3D !Double !Double !Double

It would probably be a good idea to make all the fields of Atom strict too, although perhaps not all of them; for example, the ByteString fields should be left non-strict if they're generated by the program, not always accessed and possibly large. On the other hand, if their values are read from a file, then they should probably be made strict.

Haskell Access Tuple Data Inside List Comprehension

Answers (2)

Related Questions