Justin8051
Justin8051

Reputation: 429

The most efficient way to store a very large 2D array in Python/MicroPython

I have a project in an embedded system (NodeMCU running MicroPython), where I need to store a very large array of variables, which have values of either 0 or 1. I need to be able to read/write them individually or via loops in a convenient way. For this example, I am filling the array with random integers between 0 and 1:

N = 50
table = [[randInt(0,1) for i in range(N)] for j in range(N)]

On my NodeMCU, even such a small array (2500 items) is enough to exceed the NodeMCU memory limits, crashing my script. I suppose this is because that in Python, int is an object with a lot of overhead. Since in my case I do not need the capacity of int variable - actually, 0 or 1 could be stored as a bit - how can I create and fill an array with the least-memory-consuming variables? Say, like in this example, randomizing between 0 and 1. I reviewed the uctypes, but as I'm new to Python, I couldn't get these to work. Or is there another way? How can create such an array with the least memory usage possible?

Upvotes: 2

Views: 1978

Answers (2)

Russ Hughes
Russ Hughes

Reputation: 174

I would use a bytearray to store the data. You can store 8 bits per byte by calculating the index for the byte and the bit in that byte that corresponds to a particular row and column. Once you have the indexes you can use bit shifts and bitwise operators to set, clear or retrieve the value for that row and column.

class bitarray2d():
    """create 2d array of bits cols wide and rows high"""
    def __init__(self, cols, rows):
        self.bits = bytearray(cols * rows >> 3)
        self.cols = cols
        self.rows = rows

    def set(self, column, row):
        """set the bit that corresponds to the column and row given"""
        bit = (row * self.cols) + column
        self.bits[bit >> 3] |= 1 << 7 - (bit % 8)

    def clear(self, column, row):
        """clear the bit that corresponds to the column and row given"""
        bit = (row * self.cols) + column
        self.bits[bit >> 3] &= ~(1 << 7 - (bit % 8))

    def get(self, column, row):
        """get the value of the bit that corresponds to the column and row given"""
        bit = (row * self.cols) + column
        return 1 if self.bits[bit >> 3] & 1 << 7 - (bit % 8) else 0

    def print(self):
        """print the bitarray as a 2d array"""
        for row in range(self.rows):
            for column in range(self.cols):
                print(self.get(column, row), end="")
            print()

    def line(self, x0, y0, x1, y1):
        """draw a line starting at x0, y0 and ending at x1, y1"""
        steep = abs(y1 - y0) > abs(x1 - x0)
        if steep:
            x0, y0 = y0, x0
            x1, y1 = y1, x1
        if x0 > x1:
            x0, x1 = x1, x0
            y0, y1 = y1, y0
        dx = x1 - x0
        dy = abs(y1 - y0)
        err = dx >> 1
        ystep = 1 if y0 < y1 else -1
        while x0 <= x1:
            if steep:
                self.set(y0, x0)
            else:
                self.set(x0, y0)
            err -= dy
            if err < 0:
                y0 += ystep
                err += dx
            x0 += 1

# Test routine
columns = 64
rows = 64

bits = bitarray2d(columns, rows)
bits.line(0, 0, columns-1, rows-1)
bits.line(columns-1, 0, 0, rows-1)
bits.print()

Upvotes: 1

juanpa.arrivillaga
juanpa.arrivillaga

Reputation: 95873

You are correct, that int objects have a lot of overhead. However, in this case, these int objects may actually be cached (in CPython they would be), so the only overhead should be just the pointer...

However, a pointer will still require a machine word, to not rely on such things (implementation details) and to pack things more tightly, you could use actual arrays, you are currently using a list of list objects.

The arrays module provides object-oriented wrappers around primitive, C-like numeric arrays. Unfortunately, they do not provide multidimensional arrays.

So you could try something like:

import array
import random

N = 50

def build_table(N):
    rng = range(N)
    result = []
    for _ in rng:
        arr = array.array('B') #unsigned byte
        for _ in rng:
            arr.append(random.randint(0,1))
        result.append(arr)
    return result

table = build_table(N)

If this were CPython, I would suggest the bitarray module for maximum efficiency. I have no idea if that is available for micropython, but you could implement something like that yourself, likely on top of an array.array. There are many examples of this, it is sort of a classic data structure from the times when memory was measured in bytes. Here's just one example from the Python wiki.

Upvotes: 0

Related Questions