Reputation: 8131

Data structure for arrays which share some elements -- Python

I have a collection of arrays which "overlap" at certain elements. Here's a picture of an example involving 3 arrays of characters:

  array0↓
       'A'      ↓array2
array1→'B' 'D' 'E'
       'C'     'F'

The important thing is that changes to the arrays should respect this structure. So for example, if I change the 'B' in array0 to 'X', the 'B' in array1 should also change to 'X'.

My question is what is a good, efficient way of implementing this in Python?

There are two things I have thought of so far:

One, I can make a bespoke class, instances of which contain a completely distinct list, along with information about any overlaps that it has, and implement update methods appropriately so that any change to the list is always duplicated for the other lists at the overlaps. This seems a little overwrought though, and involves duplicating the data.

Two, I could do it by using singleton lists like this:

data = [['A'], ['B'], ['C'], ['D'], ['E'], ['F']]
array0 = [data[0], data[1], data[2]]
array1 = [data[1], data[3], data[4]]
array2 = [data[4], data[5]]

for array in array0, array1, array2:
     print(array)

>>> [['A'], ['B'], ['C']]
>>> [['B'], ['D'], ['E']]
>>> [['E'], ['F']]

array0[1][0] = 'X'

for array in array0, array1, array2:
     print(array)

>>> [['A'], ['X'], ['C']]
>>> [['X'], ['D'], ['E']]
>>> [['E'], ['F']]

But I feel this may be hacky and not the best way. Thanks for any suggestions.

Upvotes: 14

Answers (5)

Dani Mesejo

Reputation: 61910

Mine suggestion is variation of the one proposed by @a_guest. You could have a wrapper class that marks the elements as shared and a data structure for handling such elements:

class SharedElement:
    def __init__(self, val):
        self.val = val

    def update(self, val):
        self.val = val

    def __repr__(self):
        return "SharedElement({0})".format(self.val)

    def __str__(self):
        return str(self.val)


class SharedList:
    def __init__(self, lst):
        self._lst = lst

    def __getitem__(self, item):
        if isinstance(self._lst[item], SharedElement):
            return self._lst[item].val
        return self._lst[item]

    def __setitem__(self, key, value):
        if isinstance(self._lst[key], SharedElement):
            self._lst[key].update(value)


B = SharedElement('B')
E = SharedElement('E')

a = SharedList(['A', B, 'C'])
b = SharedList([B, 'D', E])
c = SharedList([E, 'F'])

b[0] = 'X'

print([val for val in a])
print([val for val in b])
print([val for val in c])

Output

['A', 'X', 'C']
['X', 'D', 'E']
['E', 'F']

Upvotes: 2

a_guest

Reputation: 36249

You could use a dedicated class that updates other intersecting instances appropriately as you indicated with your first idea. I wouldn't consider data duplication a problem as for mutable data you anyway store the references and in case your using large immutable data you can employ a dedicated wrapper class (e.g. Python 3.7 introduced the @dataclass decorator).

Here is an example implementation:

from collections import defaultdict

class List(list):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._intersections = defaultdict(list)

    def __setitem__(self, index, value):
        super().__setitem__(index, value)
        for i, other in self._intersections[index]:
            other[i] = value

    def intersect(self, other, at):
        self._intersections[at[0]].append((at[1], other))

With that you can intersect the lists as in your example:

a = List(['A', 'B', 'C'])
b = List(['B', 'D', 'E'])
c = List(['E', 'F'])

a.intersect(b, (1, 0))
b.intersect(c, (2, 0))

a[1] = 'X'
b[2] = 'Y'
print(a)
print(b)
print(c)

Which gives as output:

['A', 'X', 'C']
['X', 'D', 'Y']
['Y', 'F']

Upvotes: 1

a_guest

Reputation: 36249

You could subclass list and use a dedicated wrapper class to proxy shared content. This involves no data duplication as it only stores the proxy for shared data which dispatches to the original data. It's a bit similar to your nested list approach but it maintains the normal list interface. Here is an example implementation:

class Intersection:
    def __init__(self, other, index):
        self.other = other
        self.index = index

    def __repr__(self):
        return repr(self.other[self.index])

    @property
    def value(self):
        return self.other[self.index]

    @value.setter
    def value(self, v):
        self.other[self.index] = v


class List(list):
    def __getitem__(self, index):
        item = super().__getitem__(index)
        return item.value if isinstance(item, Intersection) else item

    def __setitem__(self, index, value):
        item = super().__getitem__(index)
        if isinstance(item, Intersection):
            item.value = value
        else:
            super().__setitem__(index, value)

    def share(self, index):
        return Intersection(self, index)

Now you can share the data among your lists as required:

a = List(['A', 'B', 'C'])
b = List([a.share(1), 'D', 'E'])
c = List([b.share(2), 'F'])

a[1] = 'X'
b[2] = 'Y'
print(a)
print(b)
print(c)

Which gives as output:

['A', 'X', 'C']
['X', 'D', 'Y']
['Y', 'F']

Upvotes: 1

Lorenzo

Reputation: 162

As you pointed out in your question, the relevant information is that

array0ptr = [0, 1, 2]
array1ptr = [1, 3, 4]
array2ptr = [4, 5]

(I am adding the suffix ptr because practically those elements are pointers). Here the list element are the pointer to the objects that shall be maintained in a separate list

ol = ['A', 'B', 'C', 'D', 'E']

The real arrays can be obtained at run time by member functions like

array0 = []
for i in range(len(array0ptr)):
    array0.append(ol[array0ptr[i]])

Now your point is : suppose that the object list becomes

ol = ['A', 'B', 'intruder', 'C', 'D', 'E']

How do I automagically keep track of this in my arrays?? Those arrays should become:

array0ptr = [0, 1, 3]
array1ptr = [1, 4, 5]
array2ptr = [5, 6]

I think that the most simple answer is : keep the list fixed!, and do not allow inserting or changing the order of items. Simply mantain a different hash with the object position. In the case above, you'll have

sl = ['A', 'B', 'C', 'D', 'E', 'intruder']
slorder = [0, 1, 3, 4, 5, 2]

it is then possible to write member functions that dump the updated list of objects, the array would not change. What can be tricky is if you want to delete objects, but this is tricky in any case I fear.

Upvotes: 0

Ajax1234

Reputation: 71451

You can create a wrapper class that can handle the updating of all elements of the same value:

arr = [[['A'], ['B'], ['C']], [['B'], ['D'], ['E']], [['E'], ['F']]]
class WrapArray:
  def __init__(self, _data):
    self.d = _data
  def __getitem__(self, _x):
    self.x = _x
    class _wrapper:
      def __init__(self, _inst):
         self.ref = _inst
      def __setitem__(self, _y, _val):
         _place = self.ref.d[self.ref.x][_y][0]
         self.ref.d[self.ref.x][_y][0] = _val
         for i in range(len(self.ref.d)):
           for b in range(len(self.ref.d[i])):
             if self.ref.d[i][b][0] == _place:
               self.ref.d[i][b] = [_val]
    return _wrapper(self)
  def __repr__(self):
    return str(self.d)

array = WrapArray(arr)
array[1][0] = 'X'

Output:

[[['A'], ['X'], ['C']], [['X'], ['D'], ['E']], [['E'], ['F']]]

Upvotes: 1

Data structure for arrays which share some elements -- Python

Answers (5)

Related Questions