Reputation: 6255
I have a large hashmap containing millions of entries, and I want to persist it to disk, so that when it is read from the disk again, I don't have the overhead of inserting the key-value pairs back into the map again.
I am trying to use the cereal library to do this, but it appears that the HashMap datatype needs to derive Generic. Is there a way to do this?
Upvotes: 10
Views: 2154
Reputation: 8276
You might be able to use stand-alone deriving to generate your own Generic
instance for HashMap
. You'll probably get a warning about orphan instances, but you also probably don't care :) Anyway, I haven't tried this, but it's probably worth a shot...
Upvotes: 5
Reputation: 4740
If you can use binary, there's binary-orphans which provides instances for unordered-containers. I couldn't install binary-orphans due to some cabal conflict, but just snatched the parts I needed, e.g.:
{-# LANGUAGE CPP #-}
{-# LANGUAGE DeriveGeneric #-}
module Bin where
import Data.Binary
import Data.ByteString.Lazy.Internal
import Data.Hashable (Hashable)
import qualified Data.HashMap.Strict as M
import qualified Data.Text as T
#if !(MIN_VERSION_text(1,2,1))
import Data.Text.Binary ()
#endif
instance (Hashable k, Eq k, Binary k, Binary v) => Binary (M.HashMap k v) where
get = fmap M.fromList get
put = put . M.toList
-- Note: plain `encode M.fromList []` without type annotations won't work
encodeModel :: M.HashMap T.Text Int -> ByteString
encodeModel m =
encode m
Upvotes: 0
Reputation: 111
The CerealPlus package provides a definition of Serialize for strict HashMaps.
http://hackage.haskell.org/package/cereal-plus
Upvotes: -1
Reputation: 6255
Currently, there is no way to make HashMap serializable without modifying the HashMap library itself.
It is not possible to make Data.HashMap an instance of Generic (for use with cereal) using stand-alone deriving as described by @mergeconflict's answer, because Data.HashMap does not export all its constructors (this is a requirement for GHC).
So, the only solution left to serialize the HashMap seems to be to use the toList/fromList interface.
Upvotes: 1
Reputation: 5279
I am not sure if using Generics is a best shot at achieving high performance. My best bet would actually be writing your own instance for Serializable like this:
instance (Serializable a) => Serializable (HashMap a) where
...
To avoid creating orphan instances you can use newtype trick:
newtype SerializableHashMap a = SerializableHashMap { toHashMap :: HashMap a }
instance (Serializable a) => SerializableHashMap a where
...
The question is how to define ...
?
There is no definite answer before you actually try and implement and benchmark possible solutions.
One possible solution is to use toList
/fromList
functions and store/read the size of the HashMap
.
The other (which would be similar to using Generics) would be to write direct serialization based on internal HashMap structure. Given the fact that you dont really have the internals exported that would be a job for Generics only.
Upvotes: 1