fho
fho

Reputation: 6798

How to do memory mapped IO on custom data types?

The Setup

I recently implemented mmap based file reading and directly ran into strange behavior. The relevant code is:

-- | map whole aedat file into memory and return it as a vector of events
-- TODO what are the finalizing semantics of this?
mmapAERData :: S.Storable a => FilePath -> IO (S.Vector (AER.Event a))
mmapAERData name = do
    -- mmap file into memory and find the offset behind the header
    bs <- dropHeader <$> mmapFileByteString name Nothing
    -- some conversion is necessary to get the 'ForeignPtr' from
    -- a 'ByteString'
    B.unsafeUseAsCString bs $ \ptr -> do
      fptr <- newForeignPtr_ ptr
      let count = B.length bs `div` 8 -- sizeof one event
      return $ S.unsafeFromForeignPtr0 (castForeignPtr fptr) count

→ code in context

Some explanation: The AEDat Format is basically a long list of two Word32s. One encodes the address the other the timestamp. Before that there are some lines of header text that I drop in the dropHeader function. I could do this directly on a ForeignPtr if absolutely necessary, but I prefer to use the common function that works on ByteStrings instead.

The the Storable instances can be found here and here. I am not sure about the alignment here, but I suspect that an alignment of 8 should be correct.

The Problem

Reading the data works quite well, but after some time the memory seems to get corrupted somehow:

>>> es <- DVS.mmapDVSData "dataset.aedat" 
>>> es S.! 1000
Event {address = Address {polarity = D, posX = 6, posY = 50}, timestamp = 74.771407s}
>>> :type es
es :: S.Vector (DVS.Event DVS.Address)
>>> _ <- evaluate (V.convert es :: V.Vector (DVS.Event DVS.Address))
>>> es S.! 1000
Event {address = Address {polarity = D, posX = 0, posY = 44}, timestamp = 0s}

Apparently accessing all elements of es somehow corrupts my memory. Or the garbage collector recycles it? Either way, this is strange. What can I do about that?

Upvotes: 3

Views: 335

Answers (1)

NovaDenizen
NovaDenizen

Reputation: 5325

mmapFileByteString performs a mmap, which creates a ForeignPtr, and sticks that ForeignPtr into a ByteString. unsafeUseAsCString coerces the ForeignPtr into a Ptr, from which you then create a new ForeignPtr. Then you take that second ForeignPtr and use it with S.unsafeFromForeignPtr0 to create a vector.

Having two ForeignPtrs pointing at the same memory is a no no. The GHC runtime treats them as two separate objects. After all references to the ByteString are gone, the finalizer for itsForeignPtr will be called, deallocating the mmap and reclaiming the underlying memory. This leaves the second ForeignPtr pointing at an invalid region.

The solution here is to use Data.ByteString.Internal.toForeignPtr to extract and re-use the ForeignPtr from the ByteString. Replace the unsafeUseAsCString block with this:

let (fptr,offset,len) = Data.ByteString.Internal.toForeignPtr bs
-- it might be worthwhile to assert that offset == 0
let count = len `div` 8
return $ S.unsafeFromForeignPtr0 (castForeignPtr fptr) count

IMHO, the real solution here is not to fiddle with all this stuff at all. Just conventionally read the file into a ByteString, pull out 8-byte substrings from that and manually conver them into Events. All this mmap and ForeignPtr stuff is dangerous, and not a whole lot faster than doing things safely and correctly. If you want absolute fastest performance without regard to safety, program in C.

Upvotes: 1

Related Questions