Reputation: 27104
I'm looking for a memory-efficient way to store tabular data typically consisting of about 150000 rows x 200 columns. The cell values are Strings with lengths somewhere in the range of 0-200 characters.
The data rows are initially generated by taking all possible combinations of rows from smaller tables. So while all rows are unique, the columns contain many copies of the same value. The data is not read-only. Some of the columns (typically up to 20 of the 200) get updated with values that depend on the values of other columns. And new columns (also about 20 I'd expect) with computed values are going to be added to the table.
The existing legacy code heavily depends on the data being stored in a List
of Map<String, String>
s that map column name to cell value.
But the current implementation, an ArrayList<HashMap<String,String>>
, is taking many gigabytes of memory.
I tried calling String.intern()
on the keys and values that get inserted into the HashMap
. That halved the memory footprint. But it still seems horribly inefficient to keep all those identical Map.Entry
s around.
So I was wondering: Can you suggest a more memory-efficient data structure to somehow share the identical column values but that would allow me to keep the external List<Map<String, String>>
interface the same?
We already have guava on the class path so using collections from guava is fine.
Upvotes: 2
Views: 2493
Reputation: 53516
I have found GS-Collections to be much better suited for memory efficient Maps/Sets. They get around a lot of the overhead of storing map entry objects by using some clever tricks with arrays behind the scenes.
Upvotes: 3