me2
me2

Reputation: 3119

Using Haskell's Parsec to parse binary files?

Parsec is designed to parse textual information, but it occurs to me that Parsec could also be suitable to do binary file format parsing for complex formats that involve conditional segments, out-of-order segments, etc.

Is there an ability to do this or a similar, alternative package that does this? If not, what is the best way in Haskell to parse binary file formats?

Upvotes: 14

Views: 4195

Answers (5)

Don Stewart
Don Stewart

Reputation: 137937

The key tools for parsing binary files are:

Binary is the most general solution, Cereal can be great for limited data sizes, and attoparsec is perfectly fine for e.g. packet parsing. All of these are aimed at very high performance, unlike Parsec. There are many examples on hackage as well.

Upvotes: 12

Paul Johnson
Paul Johnson

Reputation: 17786

The best approach depends on the format of the binary file.

Many binary formats are designed to make parsing easy (unlike text formats that are primarily to be read by humans). So any union data type will be preceded by a discriminator that tells you what type to expect, all fields are either fixed length or preceded by a length field, and so on. For this kind of data I would recommend Data.Binary; typically you create a matching Haskell data type for each type in the file, and then make each of those types an instance of Binary. Define the "get" method for reading; it returns a "Get" monad action which is basically a very simple parser. You will also need to define a "put" method.

On the other hand if your binary data doesn't fit into this kind of world then you will need attoparsec. I've never used that, so I can't comment further, but this blog post is very positive.

Upvotes: 1

Edward Kmett
Edward Kmett

Reputation: 29962

It works fine, though you might want to use Parsec 3, Attoparsec, or Iteratees. Parsec's reliance on String as its intermediate representation may bloat your memory footprint quite a bit, whereas the others can be configured to use ByteStrings.

Iteratees are particularly attractive because it is easier to ensure they won't hold onto the beginning of your input and can be fed chunks of data incrementally a they come available. This prevents you from having to read the entire input into memory in advance and lets you avoid other nasty workarounds like lazy IO.

Upvotes: 2

Chris Eidhof
Chris Eidhof

Reputation: 1544

You might be interested in AttoParsec, which was designed for this purpose, I think.

Upvotes: 10

Luca Molteni
Luca Molteni

Reputation: 5370

I've used Data Binary successfully.

Upvotes: 4

Related Questions