Reputation: 6798
I recently implemented mmap
based file reading and directly ran into strange behavior. The relevant code is:
-- | map whole aedat file into memory and return it as a vector of events
-- TODO what are the finalizing semantics of this?
mmapAERData :: S.Storable a => FilePath -> IO (S.Vector (AER.Event a))
mmapAERData name = do
-- mmap file into memory and find the offset behind the header
bs <- dropHeader <$> mmapFileByteString name Nothing
-- some conversion is necessary to get the 'ForeignPtr' from
-- a 'ByteString'
B.unsafeUseAsCString bs $ \ptr -> do
fptr <- newForeignPtr_ ptr
let count = B.length bs `div` 8 -- sizeof one event
return $ S.unsafeFromForeignPtr0 (castForeignPtr fptr) count
Some explanation: The AEDat Format is basically a long list of two Word32s. One encodes the address the other the timestamp. Before that there are some lines of header text that I drop in the dropHeader
function. I could do this directly on a ForeignPtr
if absolutely necessary, but I prefer to use the common function that works on ByteStrings
instead.
The the Storable
instances can be found here and here. I am not sure about the alignment here, but I suspect that an alignment of 8 should be correct.
Reading the data works quite well, but after some time the memory seems to get corrupted somehow:
>>> es <- DVS.mmapDVSData "dataset.aedat"
>>> es S.! 1000
Event {address = Address {polarity = D, posX = 6, posY = 50}, timestamp = 74.771407s}
>>> :type es
es :: S.Vector (DVS.Event DVS.Address)
>>> _ <- evaluate (V.convert es :: V.Vector (DVS.Event DVS.Address))
>>> es S.! 1000
Event {address = Address {polarity = D, posX = 0, posY = 44}, timestamp = 0s}
Apparently accessing all elements of es
somehow corrupts my memory. Or the garbage collector recycles it? Either way, this is strange. What can I do about that?
Upvotes: 3
Views: 335
Reputation: 5325
mmapFileByteString
performs a mmap
, which creates a ForeignPtr
, and sticks that ForeignPtr
into a ByteString
. unsafeUseAsCString
coerces the ForeignPtr
into a Ptr
, from which you then create a new ForeignPtr
. Then you take that second ForeignPtr
and use it with S.unsafeFromForeignPtr0
to create a vector.
Having two ForeignPtr
s pointing at the same memory is a no no. The GHC runtime treats them as two separate objects. After all references to the ByteString
are gone, the finalizer for itsForeignPtr
will be called, deallocating the mmap
and reclaiming the underlying memory. This leaves the second ForeignPtr
pointing at an invalid region.
The solution here is to use Data.ByteString.Internal.toForeignPtr
to extract and re-use the ForeignPtr
from the ByteString
. Replace the unsafeUseAsCString
block with this:
let (fptr,offset,len) = Data.ByteString.Internal.toForeignPtr bs
-- it might be worthwhile to assert that offset == 0
let count = len `div` 8
return $ S.unsafeFromForeignPtr0 (castForeignPtr fptr) count
IMHO, the real solution here is not to fiddle with all this stuff at all. Just conventionally read the file into a ByteString
, pull out 8-byte substrings from that and manually conver them into Event
s. All this mmap
and ForeignPtr
stuff is dangerous, and not a whole lot faster than doing things safely and correctly. If you want absolute fastest performance without regard to safety, program in C.
Upvotes: 1