meownoid
meownoid

Reputation: 199

Why haskell's matrix multiplication costs so many memory?

I need to multiply [2^14 x 2^14] matrices. But first, let's talk just about [2^12 x 2^12] matrices. How many memory do I need? Assuming it's double matrices, so I need 8 bytes for one element.

((2^12 * 2^12 * 8) / 2^20) * 3 = 384 MiB

It's the worst case, when I need to store all three matrices in memory.

How many memory does haskell need? Let's see.

-> let m n = matrix (2^n) (2^n) ( \(i, j) -> (fromIntegral i) * (fromIntegral j) ) :: Matrix Double
-> let p n = let r = m n in multStd2 r r ! (n,n)
-> p 12
3.299742941184e12
(4.84 secs, 1,991,404,304 bytes)

It's ~2 GiB. Why so bad and what should I do except using C++?

UPD:

I'm using standard haskell's Data.Matrix module.

https://hackage.haskell.org/package/matrix

Upvotes: 0

Views: 544

Answers (2)

osa1
osa1

Reputation: 7088

Why so bad and what should I do except using C++.

  1. Make sure you're using -O2. In my tests it reduced the maximum memory residence from 1095M to 544M. Runtime also reduces down to 2.65s from 3.48s.

  2. Try to use a more efficient library, as also mentioned in the comments. Try hmatrix, for example.

Now, as for the "why". Here's how Matrix type is implemented in matrix library:

data Matrix a = M {
   nrows     :: {-# UNPACK #-} !Int -- ^ Number of rows.
 , ncols     :: {-# UNPACK #-} !Int -- ^ Number of columns.
 , rowOffset :: {-# UNPACK #-} !Int
 , colOffset :: {-# UNPACK #-} !Int
 , vcols     :: {-# UNPACK #-} !Int -- ^ Number of columns of the matrix without offset
 , mvect     :: V.Vector a          -- ^ Content of the matrix as a plain vector.
   }

(Assuming we're on a 64bit system)

We have 5 unpacked integers, which makes 5 x 8 bytes = 40 bytes. For the actual vector data we have a Vector. It's not an unboxed vector, which means every element of the vector will be a pointer to a heap allocated object. In our case, we have these functions:

m :: Int -> Matrix Double
m n =
  matrix (2^n) (2^n) ( \(i, j) -> (fromIntegral i) * (fromIntegral j) )

p :: Int -> Double
p n = let r = m n in multStd2 r r ! (n,n)

(types are added to make it clear)

So we have a Matrix Double of size 2^12 * 2^12 = 16777216 doubles.

Now to estimate how much memory it'll take to store that many doubles in the heap, let's look at GHC's heap object representation.

A heap object is represented as this:

typedef struct StgClosure_ {
    StgHeader   header;
    struct StgClosure_ *payload[FLEXIBLE_ARRAY];
} *StgClosurePtr;

typedef struct {
    const StgInfoTable* info;
} StgHeader;

typedef struct StgInfoTable_ {
    StgClosureInfo  layout;     /* closure layout info (one word) */
    StgHalfWord     type;       /* closure type */
    StgHalfWord     srt_bitmap;
    StgCode         code[FLEXIBLE_ARRAY];
} *StgInfoTablePtr;

typedef union {
    struct {                    /* Heap closure payload layout: */
        StgHalfWord ptrs;       /* number of pointers */
        StgHalfWord nptrs;      /* number of non-pointers */
    } payload;

    StgWord bitmap;               /* word-sized bit pattern describing */
                                  /*  a stack frame: see below */
    OFFSET_FIELD(large_bitmap_offset);  /* offset from info table to large bitmap structure */
    StgWord selector_offset;      /* used in THUNK_SELECTORs */

} StgClosureInfo;

We have quite a lot of bookkeeping here. To make things simpler, let's assume the matrix library is efficient enough to avoid thunking here(i.e. matrix function is strict in it's return value). In this case we can say that our payload will only have the actual double. So here's the math:

16777216 doubles. - 16777216 struct StgClosure_ * for Double type = 16777216 * 8 bytes. - Our payload will only have doubles = 16777216 * 8 bytes.

  • 16777216 StgHeader for Double type.
    • 16777216 StgInfoTable* = 16777216 * 8 bytes.
    • A StgInfoTable consist of:
      • 1 StgClosureInfo = at least 3 words = 24 bytes.
      • 2 half words = 8 bytes.
      • An array of bytes. In our case this will be 0 bytes I think. So just a pointer = 8 bytes.

When we add up these it makes 1073M.


Now, the match was pretty hand-wavy but hopefully it's clear that for every heap object there's a lot of bookkeeping going on. To be more precise we'd need to take things like shared info tables(e.g. as far as I know same closures would have same info tables?) into account, I think.

Upvotes: 3

Reid Barton
Reid Barton

Reputation: 15029

Two points.

  • You are looking for "unboxed" arrays (or vectors or matrices or whatever), not "boxed" ones. Unboxed arrays have the same flat representation as C, while boxed arrays are arrays of pointers to heap objects that are stored outside the array. In exchange for the additional space usage, boxed arrays can store any type of element and can also store unevaluated values, making them non-strict in their contents. For a type the same size as a machine word (assuming a 64-bit system) like Double, a boxed array will normally use three times as much space as an unboxed array.

  • The number "1,991,404,304 bytes" does not mean what you seem to think it means. It is quite literally the total amount of memory allocated during the computation. This does not say anything about the maximum amount of space used at any one time (except that it is less than 2GB). For peak space usage values, run with +RTS -s and look at the "maximum residency" value in the output when the program finishes.

Upvotes: 5

Related Questions