pix
pix

Reputation: 5070

Out of bounds assignment to a python array - am I reinventing a wheel?

I need to build up an array, but I receive my data out-of-order (and I don't know what the highest index of the array will be) so I need a way of doing array[index]=item when the index is frequently out of bounds.

I quickly threw together this function that does what I want, but I feel there might be an easier way.

def oob_assign(array,index,item,default):
  "set array[index] to item. if index is out of bounds, array is extended as necessary using default"
  array.extend([default]*(index-len(array)+1))
  array[index]=(item)

So, for example:

In [4]: a=[]

In [5]: oob_assign(a,5,"five",0)

In [6]: a
Out[6]: [0, 0, 0, 0, 0, 'five']

In [7]: a[5]
Out[7]: 'five'

Edit: While my end goal is a bit too much to ask in a stackoverflow question, the operations I need to do (relatively quickly) on the resulting data are:

The data-set is small enough (~1000 elements) that the memory usage of an array are not an issue.

Edit: Thanks for all of the great answers! I <3 Stackoverflow :)

Upvotes: 4

Views: 2538

Answers (3)

abarnert
abarnert

Reputation: 365835

If you want to keep this sparse (that is, if you set index 50 when the previous highest index was 25, you only want to create 1 new element instead of 25), the obvious answer here is a dict, plus a "largest index seen" int.

There is a tradeoff here. Obviously a real list will be faster for some operations, and use less storage when the list is actually dense rather than sparse, and will help us more with implementing things like complex slices, and so on.

But if you were thinking of using a defaultdict, this is the direction you're looking. A defaultdict does a lot of the extra stuff you need, but it won't let you iterate over the values, or give the right len, and it will create and store new values whiner you ask for them instead of leaving them sparse. You could add those things… but really, if you can't use defaultdict directly, it's not helping much; we already know where we want to defaultify and where we don't, and it's already inside a wrapper, so…

To build a custom list-like object, you can just define a handful of methods and let collections.abc.MutableSequence define the rest for you.

class ExpandoList(collections.abc.MutableSequence):
    def __init__(self):
        self.d = {}
        self.maxidx = -1
    def __setitem__(self, idx, value):
        self.maxidx = max(idx, self.maxidx)
        self.d[idx] = value
    def __getitem__(self, idx):
        self.maxidx = max(idx, self.maxidx)
        return self.d.get(idx, 0)
    def __delitem__(self, idx):
        for i in range(i, self.maxidx):
            self.d[i] = self.d[i-1]
        self.maxidx -= 1
    def insert(self, idx, value):
        for i in reversed(range(i, self.maxidx)):
            self.d[i] = self.d[i-1]
        self.d[idx] = value
        self.maxidx += 1
    def __iter__(self):
        yield from (self[i] for i in range(self.maxidx))
    def __len__(self):
        return self.maxidx

Note that this doesn't do slicing. You can handle that manually, but it gets more complicated than I'd want to put into an answer. If you want to take this further, I've got a lazylist class I can upload somewhere that shows how to solve the same problems you'd need to solve.

Upvotes: 1

Kirk Strauser
Kirk Strauser

Reputation: 30947

Here's a self-extending list for you:

class OOBList(list):
    def __init__(self, default, *args, **kwargs):
        super(OOBList, self).__init__(*args, **kwargs)
        self.default = default

    def __setitem__(self, index, value):
        max_index = len(self) - 1
        if index > max_index:
            self.extend([self.default] * (index - max_index))
        super(OOBList, self).__setitem__(index, value)

When assigning to an index beyond the end of the list, it extends the list to be long enough to hold it.

Since it subclasses list, there's no sorting required for display (but repeatedly extending the list a little at a time might be expensive).

Upvotes: 2

wflynny
wflynny

Reputation: 18521

Can you store both the (index, data_value) in a list via append? Then you can sort the list by index value:

data_items = []
for index, data_value in generate_out_of_order_data():
    data_items.append((index, data_value))
data_items.sort()
indices, data = zip(*data_items)

For example:

In [1]: random_data = [(4, 'd'), (1, 'a'), (3, 'c'), (2, 'b')]
In [2]: data_items = []
In [3]: for index, data_value in random_data:
   ...:     data_items.append((index, data_value))
   ...: 
In [4]: data_items.sort()
In [5]: indices, data = zip(*data_items)
In [6]: indices
Out[6]: (1, 2, 3, 4)
In [7]: data
Out[7]: ('a', 'b', 'c', 'd')

Upvotes: 2

Related Questions