Reputation: 9
I am working on refactoring a small portion of an open source large-scale configuration management system for my University.
We're using some open source tools for machine learning like Weka, and the aspect I am assigned to refactor is dealing with data mining and constructing rules.
The open source files we've been using from Liverpool and Japan are working well, but there are some memory usage issues when we use the program on large scale projects.
I've isolated the major memory hogs and come to the conclusion I need to figure out a different data structure to store and manipulate the data. As it stands now, the program is using what end up becoming very large multidimensional arrays of integers, objects, strings, etc.
There are several methods that simply reconfigure the set up of the associations after we are deriving rules for behaviors. In many cases, we are only adding or subtracting a single element, or simply flattening the multidimensional arrays.
I primarily program in C/C++ in general, so I am not an expert on the data structures available in Java. What I am looking to replace the static arrays with is a dynamic structure that can be easily resized without having to create a second multidimensional array.
What is happening now is we are having to create an entirely new structure every time we add and remove rules, objects, or other miscellaneous data from the multidimensional array. Then we are immediately copying into the new array.
I'd like to be able to simply use the same multidimensional array and simply add a new row and column. Subsequently, I'd like to be able to manipulate the data in the structure by simply saving a temporary value and overwriting previous values, shifting left, right, etc.
Can anyone think of any data structures in Java that would fit the bill?
On a related note, I have looked into explicit garbage collection, but have found I can only really suggest the JVM collect by calling System.Gc(), or by manipulating the garbage collection behavior of the JVM by way of tuning. Is there a better or more effective way?
Regards, Edm
Upvotes: 0
Views: 580
Reputation: 51711
To replace static arrays with a dynamic structure use an ArrayList
that grows with data automatically. To have a two-dimensional data structure use a List
of List
as
List<List<Integer>> dataStore = new ArrayList<List<Integer>>();
dataStore.add(new ArrayList<Integer>());
dataStore.add(Arrays.asList(1, 2, 3, 4));
// Access [1][3] as
System.out.println(dataStore.get(1).get(3)); // prints 4
Since, you touched upon having control over garbage collection (which Java actually does a pretty good job of all by itself) it seems memory management is of paramount importance as this is what's causing the re-factoring in the first place.
You could look into the Flyweight
GoF pattern that focuses on sharing of objects instead of repeating them to cut down on the memory footprint of the application. To enable sharing flyweight objects need to be made immutable.
Psuedo code:
// adding a new flyweight obj at [2][1]
fwObjStore.get(2).set(1, FWObjFactory.getInstance(fwKey));
public class FWObjFactory {
private static Map<String, FWObject> fwMap = new HashMap<String, FWObject>();
public static getInstance(String fwKey) {
if (!fwMap.containsKey(fwKey)) {
fwMap.put(fwKey, newFwFromKey(fwKey));
}
return fwMap.get(fwKey);
}
private static FWObject newFwFromKey(String fwKey) {
// ...
}
}
Upvotes: 1
Reputation: 18148
If you have a lot of nulls/zeroes/falses/empty-strings in your matrix, then you can save space by using a sparse matrix implementation. Matrix-toolkits has several sparse matrices that you can use / modify to suit your needs, or you can just use a hashmap with an {x, y} tuple as the key. (The hashmap also has the advantage that there are several external hashmap implementations available, e.g. BerkeleyDB, so that it's unlikely that you'll run out of memory.)
Upvotes: 1
Reputation: 19573
Why not use two Lists
tangled together? Like so:
List<List<String>> rowColumns = new ArrayList<>();
// Add a row with two entries, or columns:
List<String> oneRow = Arrays.asList("Hello", "World!");
rowColumns.add(oneRow);
Also, consider using a Map with entries mapped to Lists.
Garbage Collection should generally never have to be dealt with explicitly in Java. Usually you want to look for memory leaks whenever one occur first. When that happens, look for background threads that don't die as supposed to or strong references in caches. If you want to read some about the latter issue, you can start here and here.
Upvotes: 0
Reputation: 32787
There's no multidimentional thing in Java.Java has array of arrays.
You can use ArrayList with type parameter as ArrayList
ArrayList<ArrayList<yourType>> myList = new ArrayList<ArrayList<yourType>>();
Also,don't worry about GC..It would collect as and when required..
Upvotes: 0
Reputation: 528
I would look into using a "List of Lists". For example, you could declare something like
List<List<Object>> mArray = new ArrayList<List<Object>>();
Any time you need to add a new "row", you could do something like:
mArray.add (new ArrayList<Object>());
Check out the List interface to see what you can do with List
s in Java and which classes implement the interface (or roll your own!).
Upvotes: 0