user2827214
user2827214

Reputation: 1191

Is this a practical way to resolve 'Not enough memory' from LuaJit with Torch

StanfordNLP's TreeLSTM, when used with a dataset with > 30K instances, causes LuaJit to error with "Not Enough Memory." I am resolving this by using LuaJit Data Structures. In order to get the dataset outside of lua's heap, the trees need to be placed in a LDS.Vector.

Since the LDS.Vector holds cdata, the first step was to make the Tree type into a cdata object:

local ffi = require('ffi')

ffi.cdef([[
typedef struct CTree {
   struct CTree* parent;
   int num_children;
   struct CTree* children [25];
   int idx;
   int gold_label;
   int leaf_idx;
} CTree;
]])

There are also small changes that need to be made in read_data.lua to handle the new cdata CTree type. Using LDS seemed like a reasonable approach to solve the memory limit so far; however, the CTree requires a field named 'composer'.

Composer is of the type nn.gModule. To continue with this solution would involve creating a typedef of the nn.gModule as cdata, including creating a typedef for its members. Before continuing, does this seem like the correct direction to follow? Does any one have experience with this problem?

Upvotes: 9

Views: 1467

Answers (1)

kst
kst

Reputation: 21

As you've discovered, representing structured data in a LuaJIT heap-friendly manner is a bit of a pain at the moment.

In the Tree-LSTM implementation, the tree tables each hold a pointer to a composer instance mainly for expediency in implementation.

One workaround to avoid typedef-ing nn.gModule would be to use the existing idx field to index into a table of composer instances. In this approach, the pair (sentence_idx, node_idx) can be used uniquely identify a composer in a global two-level table of composer instances. To avoid memory issues, the current cleanup code can be replaced with a line that sets the corresponding index in the table to nil.

Upvotes: 2

Related Questions