usr2108
usr2108

Reputation: 31

Are there any tricks to reduce memory usage when storing String data type in hashmap?

I need to store value pair (word and number) in the Map.

I am trying to use TObjectIntHashMap from Trove library with char[] as the key, because I need to minimize the memory usage. But with this method, I can not get the value when I use get() method.
I guess I can not use primitive char array to store in a Map because hashcode issues.

I tried to use TCharArrayList but that takes much memory also.
I read in another stackoverflow question that similar with my purpose and have suggestion to use TLongIntHashMap , store encode values of String word in long data type. In this case my words may contains of latin characters or various other characters that appears in wikipedia collections, I do not know whether the Long is enough for encode or not.

I have tried using Trie data structure to store it, but I need to consider my performance also and choose the best for both memory usage and performance.

Do you have any idea or suggestion for this issue?

Upvotes: 1

Views: 982

Answers (2)

Dilum Ranatunga
Dilum Ranatunga

Reputation: 13374

Write your own class that implements CharSequence, and write your own implementation of equals() and hashcode(). The implementation would also pre-allocate large shared char[] storage, and use bits of it at a time. (You can definitely incorporate @Peter Lawrey's excellent suggestion into this, too, and use byte[] storage.)

There's also an opportunity to do a 'soft intern()' using an LRU cache. I've noted where the cache would go.

Here's a simple demonstration of what I mean. Note that if you need heavily concurrent writes, you can try to improve the locking scheme below...

public final class CompactString implements CharSequence {
  private final char[] _data;
  private final int _offset;
  private final int _length;
  private final int _hashCode;

  private static final Object _lock = new Object();
  private static char[] _storage;
  private static int _nextIndex;

  private static final int LENGTH_THRESHOLD = 128;

  private CompactString(char[] data, int offset, int length, int hashCode) {
    _data = data; _offset = offset; _length = length; _hashCode = hashCode;
  }

  private static final CompactString EMPTY = new CompactString(new char[0], 0, 0, "".hashCode());

  private static allocateStorage() {
    synchronized (_lock) {
      _storage = new char[1024];
      _nextIndex = 0;
    }
  }

  private static CompactString storeInShared(String value) {
    synchronized (_lock) {
      if (_nextIndex + value.length() > _storage.length) {
        allocateStorage();
      }
      int start = _nextIndex; 
      // You would need to change this loop and length to do UTF encoding.
      for (int i = 0; i < value.length(); ++i) {
        _storage[_nextIndex++] = value.charAt(i);
      }
      return new CompactString(_storage, start, value.length(), value.hashCode());
    }
  }

  static {
    allocateStorage();
  }

  public static CompactString valueOf(String value) {
    // You can implement a soft .intern-like solution here.
    if (value == null) {
      return null;
    } else if (value.length() == 0) {
      return EMPTY;
    } else if (value.length() > LENGTH_THRESHOLD) {
      // You would need to change .toCharArray() and length to do UTF encoding.
      return new CompactString(value.toCharArray(), 0, value.length(), value.hashCode());
    } else {
      return storeInShared(value);
    }
  }

  // left to reader: implement equals etc.
}

Upvotes: 0

Peter Lawrey
Peter Lawrey

Reputation: 533510

It sounds like the most compact way to store the data is to use a byte[] encoded in UTF-8 or similar. You can wrap this in your own class or write you own HashMap which allows byte[] as a key.

I would reconsider how much time it is worth spending to save some memory. If you are talking about a PC or Server, at minimum wage you need to save 1 GB for an hours work so if you are only looking to save 100 MB that's about 6 minutes including testing.

Upvotes: 3

Related Questions