mathematical.coffee
mathematical.coffee

Reputation: 56915

Python: pick appropriate datatype size (int) automatically

I'm allocating a (possibly large) matrix of zeros with Python and numpy. I plan to put unsigned integers from 1 to N in it.

N is quite variable: could easily range from 1 all the way up to a million, perhaps even more.

I know N prior to matrix initialisation. How can I choose the data type of my matrix such that I know it can hold (unsigned) integers of size N?

Furthermore, I want to pick the smallest such data type that will do.

For example, if N was 1000, I'd pick np.dtype('uint16'). If N is 240, uint16 would work, but uint8 would also work and is the smallest data type I can use to hold the numbers.

This is how I initialise the array. I'm looking for the SOMETHING_DEPENDING_ON_N:

import numpy as np
# N is known by some other calculation.
lbls = np.zeros( (10,20), dtype=np.dtype( SOMETHING_DEPENDING_ON_N ) )

cheers!

Aha!

Just realised numpy v1.6.0+ has np.min_scalar_type, documentation. D'oh! (although the answers are still useful because I don't have 1.6.0).

Upvotes: 8

Views: 2684

Answers (4)

Ali
Ali

Reputation: 21

I wrote this code for myself and I think it is more general.

def np_choose_optimal_dtype(arr, return_dtype=False):
    """
    Return the optimal dtype for a numpy array.
    """
    assert np.array_equal(np.floor(arr), arr), 'np array must be integer'
    min_val = np.min(arr)
    max_val = np.max(arr)
    type_list = [np.uint8, np.uint16, np.uint32, np.uint64]
    if min_val < 0:
        type_list = [np.int8, np.int16, np.int32, np.int64]
    for d_type in type_list:
        if np.iinfo(d_type).min <= min_val and np.iinfo(d_type).max >= max_val:
            if return_dtype:
                return d_type
            return np.array(arr, dtype=d_type)
            
    raise ValueError('Could not find a dtype for the array.')

Upvotes: 0

wim
wim

Reputation: 362657

What about writing a simple function to do the job?

import numpy as np

def type_chooser(N):
    for dtype in [np.uint8, np.uint16, np.uint32, np.uint64]:
        if N <= dtype(-1):
            return dtype
    raise Exception('{} is really big!'.format(N))

Example usage:

>>> type_chooser(255)
<type 'numpy.uint8'>
>>> type_chooser(256)
<type 'numpy.uint16'>
>>> type_chooser(18446744073709551615)
<type 'numpy.uint64'>
>>> type_chooser(18446744073709551616)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "spam.py", line 6, in type_chooser
    raise Exception('{} is really big!'.format(N))
Exception: 18446744073709551616 is really big!

Upvotes: 4

mathematical.coffee
mathematical.coffee

Reputation: 56915

For interest, here is the version I had been toying with until @Ignacio Vazquez-Abrams and @wim posted their answers, using bitshifts:

def minimal_uint_type(N):
    bases = [8,16,32,64]
    a = [N>>i for i in bases]
    try: dtype = bases[len(np.nonzero(a)[0])]
    except: raise StandardError('{} is really big!'.format(N))
    return dtype

Upvotes: 0

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 798626

Create a mapping of maximum value to type, and then look for the smallest value larger than N.

typemap = {
  256: uint8,
  65536: uint16,
   ...
}

return typemap.get(min((x for x in typemap.iterkeys() if x > N)))

Upvotes: 1

Related Questions