Why haskell's matrix multiplication costs so many memory?

Question

I need to multiply [2^14 x 2^14] matrices. But first, let's talk just about [2^12 x 2^12] matrices. How many memory do I need? Assuming it's double matrices, so I need 8 bytes for one element.

((2^12 * 2^12 * 8) / 2^20) * 3 = 384 MiB

It's the worst case, when I need to store all three matrices in memory.

How many memory does haskell need? Let's see.

-> let m n = matrix (2^n) (2^n) ( \(i, j) -> (fromIntegral i) * (fromIntegral j) ) :: Matrix Double
-> let p n = let r = m n in multStd2 r r ! (n,n)
-> p 12
3.299742941184e12
(4.84 secs, 1,991,404,304 bytes)

It's ~2 GiB. Why so bad and what should I do except using C++?

UPD:

I'm using standard haskell's Data.Matrix module.

https://hackage.haskell.org/package/matrix

osa1 · Accepted Answer

Why so bad and what should I do except using C++.

Make sure you're using -O2. In my tests it reduced the maximum memory residence from 1095M to 544M. Runtime also reduces down to 2.65s from 3.48s.
Try to use a more efficient library, as also mentioned in the comments. Try hmatrix, for example.

Now, as for the "why". Here's how Matrix type is implemented in matrix library:

data Matrix a = M {
   nrows     :: {-# UNPACK #-} !Int -- ^ Number of rows.
 , ncols     :: {-# UNPACK #-} !Int -- ^ Number of columns.
 , rowOffset :: {-# UNPACK #-} !Int
 , colOffset :: {-# UNPACK #-} !Int
 , vcols     :: {-# UNPACK #-} !Int -- ^ Number of columns of the matrix without offset
 , mvect     :: V.Vector a          -- ^ Content of the matrix as a plain vector.
   }

(Assuming we're on a 64bit system)

We have 5 unpacked integers, which makes 5 x 8 bytes = 40 bytes. For the actual vector data we have a Vector. It's not an unboxed vector, which means every element of the vector will be a pointer to a heap allocated object. In our case, we have these functions:

m :: Int -> Matrix Double
m n =
  matrix (2^n) (2^n) ( \(i, j) -> (fromIntegral i) * (fromIntegral j) )

p :: Int -> Double
p n = let r = m n in multStd2 r r ! (n,n)

(types are added to make it clear)

So we have a Matrix Double of size 2^12 * 2^12 = 16777216 doubles.

Now to estimate how much memory it'll take to store that many doubles in the heap, let's look at GHC's heap object representation.

A heap object is represented as this:

typedef struct StgClosure_ {
    StgHeader   header;
    struct StgClosure_ *payload[FLEXIBLE_ARRAY];
} *StgClosurePtr;

typedef struct {
    const StgInfoTable* info;
} StgHeader;

typedef struct StgInfoTable_ {
    StgClosureInfo  layout;     /* closure layout info (one word) */
    StgHalfWord     type;       /* closure type */
    StgHalfWord     srt_bitmap;
    StgCode         code[FLEXIBLE_ARRAY];
} *StgInfoTablePtr;

typedef union {
    struct {                    /* Heap closure payload layout: */
        StgHalfWord ptrs;       /* number of pointers */
        StgHalfWord nptrs;      /* number of non-pointers */
    } payload;

    StgWord bitmap;               /* word-sized bit pattern describing */
                                  /*  a stack frame: see below */
    OFFSET_FIELD(large_bitmap_offset);  /* offset from info table to large bitmap structure */
    StgWord selector_offset;      /* used in THUNK_SELECTORs */

} StgClosureInfo;

We have quite a lot of bookkeeping here. To make things simpler, let's assume the matrix library is efficient enough to avoid thunking here(i.e. matrix function is strict in it's return value). In this case we can say that our payload will only have the actual double. So here's the math:

16777216 doubles. - 16777216 struct StgClosure_ * for Double type = 16777216 * 8 bytes. - Our payload will only have doubles = 16777216 * 8 bytes.

16777216 StgHeader for Double type.
- 16777216 StgInfoTable* = 16777216 * 8 bytes.
- A StgInfoTable consist of:
  - 1 StgClosureInfo = at least 3 words = 24 bytes.
  - 2 half words = 8 bytes.
  - An array of bytes. In our case this will be 0 bytes I think. So just a pointer = 8 bytes.

When we add up these it makes 1073M.

Now, the match was pretty hand-wavy but hopefully it's clear that for every heap object there's a lot of bookkeeping going on. To be more precise we'd need to take things like shared info tables(e.g. as far as I know same closures would have same info tables?) into account, I think.

Why haskell's matrix multiplication costs so many memory?

Answers (2)

Related Questions

Why haskell&#39;s matrix multiplication costs so many memory?

Answers (2)

Related Questions

Why haskell's matrix multiplication costs so many memory?