Reputation: 48644
Let's assume that I have a big file (500GB+) and I have a data record
declaration Sample
which indicates a row in that file:
data Sample = Sample {
field1 :: Int,
field2 :: Int
}
Now what is the data structure suitable for processing
(filter/map/fold) on the collection of these Sample
datas ? Don
Stewart has answered here that the Sample
type should not be treated
as a list [Sample]
type but as a Vector
type. My question is how
does representing it as Vector
type solve the problem ? Doesn't
representing the file contents as a vector of Sample
type will also
occupy around 500Gb ?
What is the recommended method for solving these types of problem ?
Upvotes: 2
Views: 151
Reputation: 105876
As far as I can see, the operations you want to use (filter
, map
and fold
) can be done via both conduit (see Data.Conduit.List
) and pipes (see Pipes.Prelude
).
Both libraries are perfectly capable of manipulating/folding and filtering streaming data. Depending on your scenario they might solve your actual problem.
If you, however, need to investigate values several times, you're better of by loading chunks into a vector, as @Don said.
Upvotes: 3