duhaime
duhaime

Reputation: 27611

Mallet: java.lang.OutOfMemoryError with 1024GB Memory allocation

I am trying to use Mallet to run topic modeling on a ~1GB text file, with 11403956 rows. From the mallet directory, I cd to bin and upgrade the memory requirement to 1024GB:

set MALLET_MEMORY=1024G

I then try to run the command:

bin/mallet import-file --input combined_bios.txt --output dh_size.mallet --keep-sequence --remove-stopwords

However, this throws a memory error:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at gnu.trove.TObjectIntHashMap.rehash(TObjectIntHashMap.java:170)
        at gnu.trove.THash.postInsertHook(THash.java:359)
        at gnu.trove.TObjectIntHashMap.put(TObjectIntHashMap.java:155)
        at cc.mallet.types.Alphabet.lookupIndex(Alphabet.java:115)
        at cc.mallet.types.Alphabet.lookupIndex(Alphabet.java:123)
        at cc.mallet.types.FeatureSequence.add(FeatureSequence.java:131)
        at cc.mallet.pipe.TokenSequence2FeatureSequence.pipe(TokenSequence2FeatureSequence.java:44)
        at cc.mallet.pipe.Pipe$SimplePipeInstanceIterator.next(Pipe.java:294)
        at cc.mallet.pipe.Pipe$SimplePipeInstanceIterator.next(Pipe.java:282)
        at cc.mallet.types.InstanceList.addThruPipe(InstanceList.java:267)
        at cc.mallet.classify.tui.Csv2Vectors.main(Csv2Vectors.java:290)

Is there a workaround for such situations? Any help others can offer would be greatly appreciated!

Upvotes: 3

Views: 1657

Answers (1)

labowhat
labowhat

Reputation: 157

If you are on Linux or OS X, I think you might be altering the wrong variable. The one you are changing is found in bin/mallet.bat, but you want to change the one in the executable at bin/mallet (i.e. without the .bat file extension):

MEMORY=1g

This is also described under "Issues with Big Data" in this Mallet tutorial:

http://programminghistorian.org/lessons/topic-modeling-and-mallet

Upvotes: 4

Related Questions