user1507455
user1507455

Reputation: 1133

Java: highly segmented data (multi-dimensional arrays). How to economize space?

I create models that model complex systems. As "agents" flow through states in the system, I keep track of various characteristics. Currently, my method is using multi-dimensional arrays. For example, I report the current state of my agents every month of every year. I also need to keep track of these agents' properties. So, I use multi-dimensional arrays, as follows:

int[][][][][] reporting = new int[NUM_YEARS][12][NUM_POSSIBLE_SIZES][NUM_POSSIBLE_EXPERIENCES][MAX_AGE]; 

for (Agent a : agents){
    reporting[currentYear][currentMonth][a.size][a.experience][a.age] ++;

    //Size, experience, and age are integers
}

At the end of the simulation, I output all values to an external file.

My problem is this:

These arrays gets highly multi-dimensional (e.g. I add width, numOwners, price, weight, height, and more). I like the specificity this provides, but since Java initializes all those ints when I create the reporting structure.

In a previous post I learned that it's best to make the segments with smaller ranges earlier in the array, for example:

int[][] reporting = new int[2][20];

is better than

int[][] reporting = new int[20][2];

However, even with this optimization, I sometimes run out of heap space. I've found I typically use only 1-2% of the slots available! Any hints on conserving space but keeping the same segmentation depth for my reporting?

I've considered keeping my writing buffers open, but that doesn't seem wise; I usually have five or so of these multi-dimensional reporting structures, so I'd have to keep five BufferedWriters open.

Thanks!

Upvotes: 1

Views: 208

Answers (1)

scottb
scottb

Reputation: 10084

The broad overview of one possible solution is outlined here. This solution presumes that only a very small number of possible dimensions is ever realized and that the great majority are impossible or not meaningful.

  • Use a popular, freely available java RDBMS to create an in-memory table

  • The primary key for your table should be a composite key containing all the criteria that formed the indexes to your array.

  • When processing an agent, search for a record in the table that possesses all the criteria.

  • If a record is found, modify its data.

  • If no record is found, enter a new record in the table with the new criteria as the primary key.

  • When all the agents have been processed, you have a reasonably compact, indexed, searchable representation of your data analysis in memory.

This solution has certain advantages. If the number of possible dimensions is still large relative to available memory, you can make the table disk-based. This enables you to obtain very large datasets that would otherwise be impossible for a memory cached data structure at the cost of performance. Another advantage is that because the table is maintained by an RDBMS engine, you can search on it using a very powerful query system. You get that added versatility basically for free.

The main disadvantage of this solution is that it requires JDBC or some entity-mapping framework and so it might require you to learn a new API. Another disadvantage is that, while memory tables are relatively fast, this solution will still be slower than one which relies on primitive in-memory data structures.

There are a few RDBMS options. I am a fan of HSQLDB (http://www.hsqldb.org) currently at ver 2.3.0. It supports both cached and in-memory tables, is mature, and has a small memory footprint, and can be used in stand-alone mode (thereby making it virtually administration-free). Other freely available RDBMS enjines for Java include Apache's Derby and SQLite which can be used in Java with a separately avaialble JDBC driver. Also available are any number of libraries, open source or commercial, that provide sophisticated, customizable, and robust reporting for JDBC datasets (such as iReport from Jaspersoft).

Upvotes: 2

Related Questions