Paul Jackson
Paul Jackson

Reputation: 2147

Data structure for a multidimensional array where number of dimensions are determined at run-time

I need to process data that has a set of attributes where the number of attributes will be determined at run time. For example, a data set may contain animals, and attributes might include gender, species, age, etc. where each attribute can be represented by an integer (or an enum). I want to be able to iterate along any dimension so that I can, say, quickly calculate the total number of males, or number of dogs, etc.

I am thinking of a Java interface like this:

public interface DynamicMultidimensionalStore<T>
{
  Object getPoint(List<Integer> coordinates);
  void setPoint(List<Integer> coordinates, T item);
  Iterator<T> iterate(int dimension, List<Integer> remainingCoordinates);
  DynamicMultidimensionalStore<T> getSlice(int dimension, int offset);
}

First, there must be a name for this; Cube? I see it is similar to http://en.wikipedia.org/wiki/Spatial_index#Spatial_index but these seem more focused on spatial relationships rather than iterating over arbitrary axes.

The only structure I can think of is a class that stores the data in a linear array and performs pointer arithmetic to calculate the offsets.

Are there better solutions? I think my approach would becomes less efficient as the array becomes more sparse (or as dimensions increase).

Upvotes: 0

Views: 840

Answers (1)

Rupert Swarbrick
Rupert Swarbrick

Reputation: 2803

If I've understood your question correctly, a "sparse solution" that would work is as follows. Represent your data set as a list of dictionaries, one for each variable. Store an item by inserting a reference to it into each dictionary, keyed by the relevant property. So you'd end up with data like

{
  feet = {1: {<slug>}, 2: {<bird>, <person>}, 4: {<dog>}},
  fur  = {yes: {<dog>}, no: {<slug>, <bird>, <person>}},
  ...
}

Here, <slug> should be read as a reference / pointer to a single instance of your object type. I don't know much about Java so I can't be specific there, but an implementation in C++, say, could use std::map keyed on possible values of the parameters. Then the values would be stored as some general collection: either std::list or maybe std::set. If you were being fancier, maybe std::multimap would be more suited - I'm not entirely sure.

Counting objects with a given property should be extremely fast: you would be querying the length of some container that you look up in a hash table. The major downside is that you have n*k pointers (or references or or ...) where n is the number of objects and k is the number of axes. This may or may not be OK for you.

Upvotes: 1

Related Questions