donatello
donatello

Reputation: 6255

How to serialize/deserialize a hashmap?

I have a large hashmap containing millions of entries, and I want to persist it to disk, so that when it is read from the disk again, I don't have the overhead of inserting the key-value pairs back into the map again.

I am trying to use the cereal library to do this, but it appears that the HashMap datatype needs to derive Generic. Is there a way to do this?

Upvotes: 10

Views: 2154

Answers (5)

mergeconflict
mergeconflict

Reputation: 8276

You might be able to use stand-alone deriving to generate your own Generic instance for HashMap. You'll probably get a warning about orphan instances, but you also probably don't care :) Anyway, I haven't tried this, but it's probably worth a shot...

Upvotes: 5

unhammer
unhammer

Reputation: 4740

If you can use binary, there's binary-orphans which provides instances for unordered-containers. I couldn't install binary-orphans due to some cabal conflict, but just snatched the parts I needed, e.g.:

{-# LANGUAGE CPP           #-}
{-# LANGUAGE DeriveGeneric #-}

module Bin where

import           Data.Binary
import           Data.ByteString.Lazy.Internal
import           Data.Hashable                 (Hashable)
import qualified Data.HashMap.Strict           as M
import qualified Data.Text                     as T

#if !(MIN_VERSION_text(1,2,1))
import           Data.Text.Binary              ()
#endif

instance  (Hashable k, Eq k, Binary k, Binary v) => Binary (M.HashMap k v) where
  get = fmap M.fromList get
  put = put . M.toList

-- Note: plain `encode M.fromList []` without type annotations won't work
encodeModel :: M.HashMap T.Text Int -> ByteString
encodeModel m =
  encode m

Upvotes: 0

Carlos Reyes
Carlos Reyes

Reputation: 111

The CerealPlus package provides a definition of Serialize for strict HashMaps.

http://hackage.haskell.org/package/cereal-plus

Upvotes: -1

donatello
donatello

Reputation: 6255

Currently, there is no way to make HashMap serializable without modifying the HashMap library itself.

It is not possible to make Data.HashMap an instance of Generic (for use with cereal) using stand-alone deriving as described by @mergeconflict's answer, because Data.HashMap does not export all its constructors (this is a requirement for GHC).

So, the only solution left to serialize the HashMap seems to be to use the toList/fromList interface.

Upvotes: 1

Tener
Tener

Reputation: 5279

I am not sure if using Generics is a best shot at achieving high performance. My best bet would actually be writing your own instance for Serializable like this:

instance (Serializable a) => Serializable (HashMap a) where
  ...

To avoid creating orphan instances you can use newtype trick:

newtype SerializableHashMap a = SerializableHashMap { toHashMap :: HashMap a }
instance (Serializable a) => SerializableHashMap a where
  ...

The question is how to define ...?

There is no definite answer before you actually try and implement and benchmark possible solutions.

One possible solution is to use toList/fromList functions and store/read the size of the HashMap.

The other (which would be similar to using Generics) would be to write direct serialization based on internal HashMap structure. Given the fact that you dont really have the internals exported that would be a job for Generics only.

Upvotes: 1

Related Questions