dseeley
dseeley

Reputation: 137

Best data structures for 3D array in Python

I am developing a moving average filter for position "tracks" in Touch Designer, which implements a Python runtime. I'm new to Python and I'm unclear on the best data structures to use. The pcode is roughly:

  1. Receive a new list of tracks formatted as:

    id, posX, posY
    2001, 0.54, 0.21
    2002, 0.43, 0.23
    ...

  2. Add incoming X and Y values to an existing data structure keyed on "id," if that id is already present
  3. Create new entries for new ids
  4. Remove any id entries that are not present in the incoming data
  5. Return a moving average of X and Y values per id

Question: Would it be a good idea to do this as a hashtable where the key is id, and the values are a list of lists? eg:

ids = {2001: posVals, 2002: posVals2}

where posVals is a list of [x,y] pairs?

I think this is like a 3D array, but where I want to use id as a key for lots of operations.

Thanks

Upvotes: 1

Views: 999

Answers (2)

abarnert
abarnert

Reputation: 365787

First, if the IDs are all relatively small positive integers, and the array is almost completely dense (that is, almost all IDs will exist), a dict is just adding extra overhead. On the other hand, if there are large numbers, or large gaps, or keys of different types, using a dict as a "sparse array" makes perfect sense.

Meanwhile, for the other two dimensions… how many IDs are you going to have, and how many pairs for each? If we're talking a handful of pairs per ID and a few thousands total pairs across all IDs, a list of pairs per ID is perfectly fine (although I'd probably represent each pair as a tuple rather than a list), and anything else would be adding unnecessary complexity. But if there are going to be a lot of pairs per ID, or a lot total, you may run into problems with storage, or performance.

If you can use the third-party library numpy, it can store a 2D array of numbers in much less memory than a list of pairs of numbers, and it can perform calculations like moving averages with much briefer/more readable code, and with much less CPU time. In fact, it can even store a sparse 3D array for you.

If you can only use the standard library, the array module can get you most of the same memory benefits, but without the simplicity benefits (in fact, your code becomes slightly more complex, as you have to represent the 2D array as a 1D array, old-school-C-style—although you can wrap that up pretty easily), or the time performance benefits, and it can't help with the sparseness.

Upvotes: 1

Paul Whalen
Paul Whalen

Reputation: 451

Yes, this is how I would do it. It's very intuitive this way, assuming you're always looking things up by their id and don't need to sort in some other way.

Also, the terminology in Python is dict (as in dictionary), rather than hashtable.

Upvotes: 0

Related Questions