Reputation: 2147
I need to process data that has a set of attributes where the number of attributes will be determined at run time. For example, a data set may contain animals, and attributes might include gender, species, age, etc. where each attribute can be represented by an integer (or an enum). I want to be able to iterate along any dimension so that I can, say, quickly calculate the total number of males, or number of dogs, etc.
I am thinking of a Java interface like this:
public interface DynamicMultidimensionalStore<T>
{
Object getPoint(List<Integer> coordinates);
void setPoint(List<Integer> coordinates, T item);
Iterator<T> iterate(int dimension, List<Integer> remainingCoordinates);
DynamicMultidimensionalStore<T> getSlice(int dimension, int offset);
}
First, there must be a name for this; Cube? I see it is similar to http://en.wikipedia.org/wiki/Spatial_index#Spatial_index but these seem more focused on spatial relationships rather than iterating over arbitrary axes.
The only structure I can think of is a class that stores the data in a linear array and performs pointer arithmetic to calculate the offsets.
Are there better solutions? I think my approach would becomes less efficient as the array becomes more sparse (or as dimensions increase).
Upvotes: 0
Views: 840
Reputation: 2803
If I've understood your question correctly, a "sparse solution" that would work is as follows. Represent your data set as a list of dictionaries, one for each variable. Store an item by inserting a reference to it into each dictionary, keyed by the relevant property. So you'd end up with data like
{
feet = {1: {<slug>}, 2: {<bird>, <person>}, 4: {<dog>}},
fur = {yes: {<dog>}, no: {<slug>, <bird>, <person>}},
...
}
Here, <slug>
should be read as a reference / pointer to a single instance of your object type. I don't know much about Java so I can't be specific there, but an implementation in C++, say, could use std::map
keyed on possible values of the parameters. Then the values would be stored as some general collection: either std::list
or maybe std::set
. If you were being fancier, maybe std::multimap
would be more suited - I'm not entirely sure.
Counting objects with a given property should be extremely fast: you would be querying the length of some container that you look up in a hash table. The major downside is that you have n*k
pointers (or references or or ...) where n
is the number of objects and k
is the number of axes. This may or may not be OK for you.
Upvotes: 1