Reputation: 16737
np.searchsorted
only for 1D arrays.
I have a lexicographically sorted 2D array, meaning that 0-th row is sorted, then for same values of 0-th row corresponding elements of 1-th row are sorted too, for same values of 1-th row values of 2-th row are sorted too. In other words tuples consisting of columns are sorted.
I have some other 2D array with tuples-columns that need to be inserted into first 2D array into correct positions of columns. For 1D case np.searchsorted
was usually used in order to find correct positions.
But for 2D array is there an alternative to np.searchsorted
? Something analagous to how np.lexsort is a 2D alternative for 1D np.argsort.
If no such function then can be this functionality implemented in an efficient way using existing numpy functions?
I am interested in efficient solutions for arrays of any dtype
including np.object_
.
One naive way to handle any dtype
case would be to convert each column of both arrays to 1D array (or tuple) and then store these columns as another 1D array of dtype = np.object_
. Maybe it is not that naive and could be even fast especially if columns are quite high.
Upvotes: 3
Views: 1971
Reputation: 16737
Posting first naive solution that I mentioned in my question, it just converts 2D array to 1D array of dtype = np.object_
containing original columns as Python tuples, then uses 1D np.searchsorted
, solution works for any dtype
. In fact this solution is not that naive, its quite fast, as measured in my other answer to current question, especially it is fast for keys lengths below 100.
import numpy as np
np.random.seed(0)
def to_obj(x):
res = np.empty((x.shape[0],), dtype = np.object_)
res[:] = [tuple(np.squeeze(e, 0)) for e in np.split(x, x.shape[0], axis = 0)]
return res
a = np.random.randint(0, 3, (10, 23))
b = np.random.randint(0, 3, (10, 15))
a, b = [x[:, np.lexsort(x[::-1])] for x in (a, b)]
print(np.concatenate((np.arange(a.shape[1])[None, :], a)), '\n\n', b, '\n')
a, b = [to_obj(x.T) for x in (a, b)]
print(np.searchsorted(a, b))
Upvotes: 0
Reputation: 16737
I've created several more advanced strategies.
Also simple strategy using tuples
like in another my answer is implemented.
Timings of all solutions are measured.
Most of strategies are using np.searchsorted
as underlying engine. To implement these advanced strategies a special wrapping class _CmpIx
was used in order to provide custom comparison function (__lt__
) for np.searchsorted
call.
py.tuples
strategy just converts all columns to tuples and stores them as numpy 1D array of np.object_
dtype and then doing regular searchsorting.py.zip
uses python's zip for lazily doing same task.np.lexsort
strategy just uses np.lexsort
in order to compare two columns lexicographically.np.nonzero
uses np.flatnonzero(a != b)
expression.cmp_numba
uses ahead of time compiled numba code inside _CmpIx
wrapper for fast lexicographically lazy comparing of two provided elements.np.searchsorted
uses standard numpy's function but is measured for 1D case only.numba
strategy whole search algorithm is implemented from scratch using Numba engine, algorithm is based on binary search. There is _py
and _nm
variants of this algorithm, _nm
is much faster as it uses Numba compiler, while _py
is same algorithm but un-compiled. Also there is _sorted
flavor which does extra optimization of array to be inserted is already sorted.view1d
- methods suggested by @MadPhysicist in this answer. Commented out them in code, because they were returning incorrect answers for most of tests for all key lengths >1, probably due to some problems of raw viewing into array.class SearchSorted2D:
class _CmpIx:
def __init__(self, t, p, i):
self.p, self.i = p, i
self.leg = self.leg_cache()[t]
self.lt = lambda o: self.leg(self, o, False) if self.i != o.i else False
self.le = lambda o: self.leg(self, o, True) if self.i != o.i else True
@classmethod
def leg_cache(cls):
if not hasattr(cls, 'leg_cache_data'):
cls.leg_cache_data = {
'py.zip': cls._leg_py_zip, 'np.lexsort': cls._leg_np_lexsort,
'np.nonzero': cls._leg_np_nonzero, 'cmp_numba': cls._leg_numba_create(),
}
return cls.leg_cache_data
def __eq__(self, o): return not self.lt(o) and self.le(o)
def __ne__(self, o): return self.lt(o) or not self.le(o)
def __lt__(self, o): return self.lt(o)
def __le__(self, o): return self.le(o)
def __gt__(self, o): return not self.le(o)
def __ge__(self, o): return not self.lt(o)
@staticmethod
def _leg_np_lexsort(self, o, eq):
import numpy as np
ia, ib = (self.i, o.i) if eq else (o.i, self.i)
return (np.lexsort(self.p.ab[::-1, ia : (ib + (-1, 1)[ib >= ia], None)[ib == 0] : ib - ia])[0] == 0) == eq
@staticmethod
def _leg_py_zip(self, o, eq):
for l, r in zip(self.p.ab[:, self.i], self.p.ab[:, o.i]):
if l < r:
return True
if l > r:
return False
return eq
@staticmethod
def _leg_np_nonzero(self, o, eq):
import numpy as np
a, b = self.p.ab[:, self.i], self.p.ab[:, o.i]
ix = np.flatnonzero(a != b)
return a[ix[0]] < b[ix[0]] if ix.size != 0 else eq
@staticmethod
def _leg_numba_create():
import numpy as np
try:
from numba.pycc import CC
cc = CC('ss_numba_mod')
@cc.export('ss_numba_i8', 'b1(i8[:],i8[:],b1)')
def ss_numba(a, b, eq):
for i in range(a.size):
if a[i] < b[i]:
return True
elif b[i] < a[i]:
return False
return eq
cc.compile()
success = True
except:
success = False
if success:
try:
import ss_numba_mod
except:
success = False
def odo(self, o, eq):
a, b = self.p.ab[:, self.i], self.p.ab[:, o.i]
assert a.ndim == 1 and a.shape == b.shape, (a.shape, b.shape)
return ss_numba_mod.ss_numba_i8(a, b, eq)
return odo if success else None
def __init__(self, type_):
import numpy as np
self.type_ = type_
self.ci = np.array([], dtype = np.object_)
def __call__(self, a, b, *pargs, **nargs):
import numpy as np
self.ab = np.concatenate((a, b), axis = 1)
self._grow(self.ab.shape[1])
ix = np.searchsorted(self.ci[:a.shape[1]], self.ci[a.shape[1] : a.shape[1] + b.shape[1]], *pargs, **nargs)
return ix
def _grow(self, to):
import numpy as np
if self.ci.size >= to:
return
import math
to = 1 << math.ceil(math.log(to) / math.log(2))
self.ci = np.concatenate((self.ci, [self._CmpIx(self.type_, self, i) for i in range(self.ci.size, to)]))
class SearchSorted2DNumba:
@classmethod
def do(cls, a, v, side = 'left', *, vsorted = False, numba_ = True):
import numpy as np
if not hasattr(cls, '_ido_numba'):
def _ido_regular(a, b, vsorted, lrt):
nk, na, nb = a.shape[0], a.shape[1], b.shape[1]
res = np.zeros((2, nb), dtype = np.int64)
max_depth = 0
if nb == 0:
return res, max_depth
#lb, le, rb, re = 0, 0, 0, 0
lrb, lre = 0, 0
if vsorted:
brngs = np.zeros((nb, 6), dtype = np.int64)
brngs[0, :4] = (-1, 0, nb >> 1, nb)
i, j, size = 0, 1, 1
while i < j:
for k in range(i, j):
cbrng = brngs[k]
bp, bb, bm, be = cbrng[:4]
if bb < bm:
brngs[size, :4] = (k, bb, (bb + bm) >> 1, bm)
size += 1
bmp1 = bm + 1
if bmp1 < be:
brngs[size, :4] = (k, bmp1, (bmp1 + be) >> 1, be)
size += 1
i, j = j, size
assert size == nb
brngs[:, 4:] = -1
for ibc in range(nb):
if not vsorted:
ib, lrb, lre = ibc, 0, na
else:
ibpi, ib = int(brngs[ibc, 0]), int(brngs[ibc, 2])
if ibpi == -1:
lrb, lre = 0, na
else:
ibp = int(brngs[ibpi, 2])
if ib < ibp:
lrb, lre = int(brngs[ibpi, 4]), int(res[1, ibp])
else:
lrb, lre = int(res[0, ibp]), int(brngs[ibpi, 5])
brngs[ibc, 4 : 6] = (lrb, lre)
assert lrb != -1 and lre != -1
for ik in range(nk):
if lrb >= lre:
if ik > max_depth:
max_depth = ik
break
bv = b[ik, ib]
# Binary searches
if nk != 1 or lrt == 2:
cb, ce = lrb, lre
while cb < ce:
cm = (cb + ce) >> 1
av = a[ik, cm]
if av < bv:
cb = cm + 1
elif bv < av:
ce = cm
else:
break
lrb, lre = cb, ce
if nk != 1 or lrt >= 1:
cb, ce = lrb, lre
while cb < ce:
cm = (cb + ce) >> 1
if not (bv < a[ik, cm]):
cb = cm + 1
else:
ce = cm
#rb, re = cb, ce
lre = ce
if nk != 1 or lrt == 0 or lrt == 2:
cb, ce = lrb, lre
while cb < ce:
cm = (cb + ce) >> 1
if a[ik, cm] < bv:
cb = cm + 1
else:
ce = cm
#lb, le = cb, ce
lrb = cb
#lrb, lre = lb, re
res[:, ib] = (lrb, lre)
return res, max_depth
cls._ido_regular = _ido_regular
import numba
cls._ido_numba = numba.jit(nopython = True, nogil = True, cache = True)(cls._ido_regular)
assert side in ['left', 'right', 'left_right'], side
a, v = np.array(a), np.array(v)
assert a.ndim == 2 and v.ndim == 2 and a.shape[0] == v.shape[0], (a.shape, v.shape)
res, max_depth = (cls._ido_numba if numba_ else cls._ido_regular)(
a, v, vsorted, {'left': 0, 'right': 1, 'left_right': 2}[side],
)
return res[0] if side == 'left' else res[1] if side == 'right' else res
def Test():
import time
import numpy as np
np.random.seed(0)
def round_float_fixed_str(x, n = 0):
if type(x) is int:
return str(x)
s = str(round(float(x), n))
if n > 0:
s += '0' * (n - (len(s) - 1 - s.rfind('.')))
return s
def to_tuples(x):
r = np.empty([x.shape[1]], dtype = np.object_)
r[:] = [tuple(e) for e in x.T]
return r
searchsorted2d = {
'py.zip': SearchSorted2D('py.zip'),
'np.nonzero': SearchSorted2D('np.nonzero'),
'np.lexsort': SearchSorted2D('np.lexsort'),
'cmp_numba': SearchSorted2D('cmp_numba'),
}
for iklen, klen in enumerate([1, 1, 2, 5, 10, 20, 50, 100, 200]):
times = {}
for side in ['left', 'right']:
a = np.zeros((klen, 0), dtype = np.int64)
tac = to_tuples(a)
for itest in range((15, 100)[iklen == 0]):
b = np.random.randint(0, (3, 100000)[iklen == 0], (klen, np.random.randint(1, (1000, 2000)[iklen == 0])), dtype = np.int64)
b = b[:, np.lexsort(b[::-1])]
if iklen == 0:
assert klen == 1, klen
ts = time.time()
ix1 = np.searchsorted(a[0], b[0], side = side)
te = time.time()
times['np.searchsorted'] = times.get('np.searchsorted', 0.) + te - ts
for cached in [False, True]:
ts = time.time()
tb = to_tuples(b)
ta = tac if cached else to_tuples(a)
ix1 = np.searchsorted(ta, tb, side = side)
if not cached:
ix0 = ix1
tac = np.insert(tac, ix0, tb) if cached else tac
te = time.time()
timesk = f'py.tuples{("", "_cached")[cached]}'
times[timesk] = times.get(timesk, 0.) + te - ts
for type_ in searchsorted2d.keys():
if iklen == 0 and type_ in ['np.nonzero', 'np.lexsort']:
continue
ss = searchsorted2d[type_]
try:
ts = time.time()
ix1 = ss(a, b, side = side)
te = time.time()
times[type_] = times.get(type_, 0.) + te - ts
assert np.array_equal(ix0, ix1)
except Exception:
times[type_ + '!failed'] = 0.
for numba_ in [False, True]:
for vsorted in [False, True]:
if numba_:
# Heat-up/pre-compile numba
SearchSorted2DNumba.do(a, b, side = side, vsorted = vsorted, numba_ = numba_)
ts = time.time()
ix1 = SearchSorted2DNumba.do(a, b, side = side, vsorted = vsorted, numba_ = numba_)
te = time.time()
timesk = f'numba{("_py", "_nm")[numba_]}{("", "_sorted")[vsorted]}'
times[timesk] = times.get(timesk, 0.) + te - ts
assert np.array_equal(ix0, ix1)
# View-1D methods suggested by @MadPhysicist
if False: # Commented out as working just some-times
aT, bT = np.copy(a.T), np.copy(b.T)
assert aT.ndim == 2 and bT.ndim == 2 and aT.shape[1] == klen and bT.shape[1] == klen, (aT.shape, bT.shape, klen)
for ty in ['if', 'cf']:
try:
dt = np.dtype({'if': [('', b.dtype)] * klen, 'cf': [('row', b.dtype, klen)]}[ty])
ts = time.time()
va = np.ndarray(aT.shape[:1], dtype = dt, buffer = aT)
vb = np.ndarray(bT.shape[:1], dtype = dt, buffer = bT)
ix1 = np.searchsorted(va, vb, side = side)
te = time.time()
assert np.array_equal(ix0, ix1), (ix0.shape, ix1.shape, ix0[:20], ix1[:20])
times[f'view1d_{ty}'] = times.get(f'view1d_{ty}', 0.) + te - ts
except Exception:
raise
a = np.insert(a, ix0, b, axis = 1)
stimes = ([f'key_len: {str(klen).rjust(3)}'] +
[f'{k}: {round_float_fixed_str(v, 4).rjust(7)}' for k, v in times.items()])
nlines = 4
print('-' * 50 + '\n' + ('', '!LARGE!:\n')[iklen == 0], end = '')
for i in range(nlines):
print(', '.join(stimes[len(stimes) * i // nlines : len(stimes) * (i + 1) // nlines]), flush = True)
Test()
outputs:
--------------------------------------------------
!LARGE!:
key_len: 1, np.searchsorted: 0.0250
py.tuples_cached: 3.3113, py.tuples: 30.5263, py.zip: 40.9785
cmp_numba: 25.7826, numba_py: 3.6673
numba_py_sorted: 6.8926, numba_nm: 0.0466, numba_nm_sorted: 0.0505
--------------------------------------------------
key_len: 1, py.tuples_cached: 0.1371
py.tuples: 0.4698, py.zip: 1.2005, np.nonzero: 4.7827
np.lexsort: 4.4672, cmp_numba: 1.0644, numba_py: 0.2748
numba_py_sorted: 0.5699, numba_nm: 0.0005, numba_nm_sorted: 0.0020
--------------------------------------------------
key_len: 2, py.tuples_cached: 0.1131
py.tuples: 0.3643, py.zip: 1.0670, np.nonzero: 4.5199
np.lexsort: 3.4595, cmp_numba: 0.8582, numba_py: 0.4958
numba_py_sorted: 0.6454, numba_nm: 0.0025, numba_nm_sorted: 0.0025
--------------------------------------------------
key_len: 5, py.tuples_cached: 0.1876
py.tuples: 0.4493, py.zip: 1.6342, np.nonzero: 5.5168
np.lexsort: 4.6086, cmp_numba: 1.0939, numba_py: 1.0607
numba_py_sorted: 0.9737, numba_nm: 0.0050, numba_nm_sorted: 0.0065
--------------------------------------------------
key_len: 10, py.tuples_cached: 0.6017
py.tuples: 1.2275, py.zip: 3.5276, np.nonzero: 13.5460
np.lexsort: 12.4183, cmp_numba: 2.5404, numba_py: 2.8334
numba_py_sorted: 2.3991, numba_nm: 0.0165, numba_nm_sorted: 0.0155
--------------------------------------------------
key_len: 20, py.tuples_cached: 0.8316
py.tuples: 1.3759, py.zip: 3.4238, np.nonzero: 13.7834
np.lexsort: 16.2164, cmp_numba: 2.4483, numba_py: 2.6405
numba_py_sorted: 2.2226, numba_nm: 0.0170, numba_nm_sorted: 0.0160
--------------------------------------------------
key_len: 50, py.tuples_cached: 1.0443
py.tuples: 1.4085, py.zip: 2.2475, np.nonzero: 9.1673
np.lexsort: 19.5266, cmp_numba: 1.6181, numba_py: 1.7731
numba_py_sorted: 1.4637, numba_nm: 0.0415, numba_nm_sorted: 0.0405
--------------------------------------------------
key_len: 100, py.tuples_cached: 2.0136
py.tuples: 2.5380, py.zip: 2.2279, np.nonzero: 9.2929
np.lexsort: 33.9505, cmp_numba: 1.5722, numba_py: 1.7158
numba_py_sorted: 1.4208, numba_nm: 0.0871, numba_nm_sorted: 0.0851
--------------------------------------------------
key_len: 200, py.tuples_cached: 3.5945
py.tuples: 4.1847, py.zip: 2.3553, np.nonzero: 11.3781
np.lexsort: 66.0104, cmp_numba: 1.8153, numba_py: 1.9449
numba_py_sorted: 1.6463, numba_nm: 0.1661, numba_nm_sorted: 0.1651
As it appears from timings numba_nm
implementation is the fastest, it outperforms next fastest (py.zip
or py.tuples_cached
) by 15-100x
times. And it has comparable speed (1.85x
slower) to standard np.searchsorted
for 1D case. Also it appeared to be that _sorted
flavor doesn't improve situation (i.e. using information about inserted array being sorted).
cmp_numba
method that is machine-code compiled appears to be around 1.5x
times faster on average than py.zip
that does same algorithm but in pure python. Due to average maximum equal-key depth being around 15-18
elements numba doesn't gain much speedup here. If depth was hundreds then numba code would probably have a huge speedup.
py.tuples_cached
strategy is faster than py.zip
for the case of key length <= 100
.
Also it appears to be that np.lexsort
is in fact very slow, either it is not optimized for the case of just two columns, or it spends time doing preprocessing like splitting rows into list, or it does non-lazy lexicographical comparison, the last case is probably the real reason as lexsort slows down with key length grow.
Strategy np.nonzero
is also non-lazy hence works slow too, and slows down with key length growth (but slows down not that fast as np.lexsort
does).
Timings above may be not precise, because my CPU slows down cores frequency 2-2.3 times at random times whenever it is overheated, and it overheats often because it is a powerful CPU inside laptop.
Upvotes: 3
Reputation: 114230
Two things can help you here: (1) you can sort and search structured arrays, and (2) if you have finite collections that can be mapped to integers, you can use that to your advantage.
Viewing as 1D
Let's say you have an array of strings that you want to insert into:
data = np.array([['a', '1'], ['a', 'z'], ['b', 'a']], dtype=object)
Since arrays are never ragged, you can construct a dtype that is the size of a row:
dt = np.dtype([('', data.dtype)] * data.shape[1])
Using my shamelessly plugged answer here, you can view the original 2D array as 1D now:
view = np.ndarray(data.shape[:1], dtype=dt, buffer=data)
Searching can be done in a totally straightforward manner now:
key = np.array([('a', 'a')], dtype=dt)
index = np.searchsorted(view, key)
You can even find the insertion indices of incomplete elements by using the appropriate minimum values. For strings this would be ''
.
You may get better mileage out of the comparison if you don't have to check each field of the dtype. You can make a similar dtype with a single homogeneous field:
dt2 = np.dtype([('row', data.dtype, data.shape[1])])
Constructing the view is the same as before:
view = np.ndarray(data.shape[:1], dtype=dt2, buffer=data)
The key is done a little differently this time (another plug here):
key = np.array([(['a', 'a'],)], dtype=dt2)
The sort order imposed on objects is not correct with this method: Sorting array of objects by row using custom dtype. I am leaving a reference here in case there is a fix in the linked question. Also, it is still quite useful for sorting integers.
Integer Mapping
If you have a finite number of objects to search, it is easier to map them to integers:
idata = np.empty(data.shape, dtype=int)
keys = [None] * data.shape[1] # Map index to key per column
indices = [None] * data.shape[1] # Map key to index per column
for i in range(data.shape[1]):
keys[i], idata[:, i] = np.unique(data[:, i], return_inverse=True)
indices[i] = {k: i for i, k in enumerate(keys[i])} # Assumes hashable objects
idt = np.dtype([('row', idata.dtype, idata.shape[1])])
view = idata.view(idt).ravel()
This only works if data
actually contains all the possible keys in each column. Otherwise, you will have to get the forward and reverse mappings by other means. Once you have that established, setting up the keys is much simpler, and only requires indices
:
key = np.array([index[k] for index, k in zip(indices, ['a', 'a'])])
Further Improvements
If the number of categories you have is eight or less, and each category has 256 or fewer elements, you can construct an even better hash by fitting everything into a single np.uint64
element or so.
k = math.ceil(math.log(data.shape[1], 2)) # math.log provides base directly
assert 0 < k <= 64
idata = np.empty((data.shape[:1], k), dtype=np.uint8)
...
idata = idata.view(f'>u{k}').ravel()
Keys are made similarly as well:
key = np.array([index[k] for index, k in zip(indices, ['a', 'a'])]).view(f'>u{k}')
Timing
I've timed the methods shown here (not the other answers) using randomly shuffled strings. Key timing parameters are:
M
: number of rows: 10**{2, 3, 4, 5}N
: number of columns: 2**{3, 4, 5, 6}K
: number of elements to insert: 1, 10, M // 10
individual_fields
, combined_field
, int_mapping
, int_packing
. Functions shown below.For the last two methods, I assume that you will pre-convert the data into the mapped dtype, but not the search keys. I am therefore passing in the converted data, but timing the conversion of the keys.
import numpy as np
from math import ceil, log
def individual_fields(data, keys):
dt = [('', data.dtype)] * data.shape[1]
dview = np.ndarray(data.shape[:1], dtype=dt, buffer=data)
kview = np.ndarray(keys.shape[:1], dtype=dt, buffer=keys)
return np.searchsorted(dview, kview)
def combined_fields(data, keys):
dt = [('row', data.dtype, data.shape[1])]
dview = np.ndarray(data.shape[:1], dtype=dt, buffer=data)
kview = np.ndarray(keys.shape[:1], dtype=dt, buffer=keys)
return np.searchsorted(dview, kview)
def int_mapping(idata, keys, indices):
idt = np.dtype([('row', idata.dtype, idata.shape[1])])
dview = idata.view(idt).ravel()
kview = np.empty(keys.shape[0], dtype=idt)
for i, (index, key) in enumerate(zip(indices, keys.T)):
kview['row'][:, i] = [index[k] for k in key]
return np.searchsorted(dview, kview)
def int_packing(idata, keys, indices):
idt = f'>u{idata.shape[1]}'
dview = idata.view(idt).ravel()
kview = np.empty(keys.shape, dtype=np.uint8)
for i, (index, key) in enumerate(zip(indices, keys.T)):
kview[:, i] = [index[k] for k in key]
kview = kview.view(idt).ravel()
return np.searchsorted(dview, kview)
The timing code:
from math import ceil, log
from string import ascii_lowercase
from timeit import Timer
def time(m, n, k, fn, *args):
t = Timer(lambda: fn(*args))
s = t.autorange()[0]
print(f'M={m}; N={n}; K={k} {fn.__name__}: {min(t.repeat(5, s)) / s}')
selection = np.array(list(ascii_lowercase), dtype=object)
for lM in range(2, 6):
M = 10**lM
for lN in range(3, 6):
N = 2**lN
data = np.random.choice(selection, size=(M, N))
np.ndarray(data.shape[0], dtype=[('', data.dtype)] * data.shape[1], buffer=data).sort()
idata = np.array([[ord(a) - ord('a') for a in row] for row in data], dtype=np.uint8)
ikeys = [selection] * data.shape[1]
indices = [{k: i for i, k in enumerate(selection)}] * data.shape[1]
for K in (1, 10, M // 10):
key = np.random.choice(selection, size=(K, N))
time(M, N, K, individual_fields, data, key)
time(M, N, K, combined_fields, data, key)
time(M, N, K, int_mapping, idata, key, indices)
if N <= 8:
time(M, N, K, int_packing, idata, key, indices)
The results:
M=100 (units=us)
| K |
+---------------------------+---------------------------+
N | 1 | 10 |
+------+------+------+------+------+------+------+------+
| IF | CF | IM | IP | IF | CF | IM | IP |
---+------+------+------+------+------+------+------+------+
8 | 25.9 | 18.6 | 52.6 | 48.2 | 35.8 | 22.7 | 76.3 | 68.2 |
16 | 40.1 | 19.0 | 87.6 | -- | 51.1 | 22.8 | 130. | -- |
32 | 68.3 | 18.7 | 157. | -- | 79.1 | 22.4 | 236. | -- |
64 | 125. | 18.7 | 290. | -- | 135. | 22.4 | 447. | -- |
---+------+------+------+------+------+------+------+------+
M=1000 (units=us)
| K |
+---------------------------+---------------------------+---------------------------+
N | 1 | 10 | 100 |
+------+------+------+------+------+------+------+------+------+------+------+------+
| IF | CF | IM | IP | IF | CF | IM | IP | IF | CF | IM | IP |
---+------+------+------+------+------+------+------+------+------+------+------+------+
8 | 26.9 | 19.1 | 55.0 | 55.0 | 44.8 | 25.1 | 79.2 | 75.0 | 218. | 74.4 | 305. | 250. |
16 | 41.0 | 19.2 | 90.5 | -- | 59.3 | 24.6 | 134. | -- | 244. | 79.0 | 524. | -- |
32 | 68.5 | 19.0 | 159. | -- | 87.4 | 24.7 | 241. | -- | 271. | 80.5 | 984. | -- |
64 | 128. | 19.7 | 312. | -- | 168. | 26.0 | 549. | -- | 396. | 7.78 | 2.0k | -- |
---+------+------+------+------+------+------+------+------+------+------+------+------+
M=10K (units=us)
| K |
+---------------------------+---------------------------+---------------------------+
N | 1 | 10 | 1000 |
+------+------+------+------+------+------+------+------+------+------+------+------+
| IF | CF | IM | IP | IF | CF | IM | IP | IF | CF | IM | IP |
---+------+------+------+------+------+------+------+------+------+------+------+------+
8 | 28.8 | 19.5 | 54.5 | 107. | 57.0 | 27.2 | 90.5 | 128. | 3.2k | 762. | 2.7k | 2.1k |
16 | 42.5 | 19.6 | 90.4 | -- | 73.0 | 27.2 | 140. | -- | 3.3k | 752. | 4.6k | -- |
32 | 73.0 | 19.7 | 164. | -- | 104. | 26.7 | 246. | -- | 3.4k | 803. | 8.6k | -- |
64 | 135. | 19.8 | 302. | -- | 162. | 26.1 | 466. | -- | 3.7k | 791. | 17.k | -- |
---+------+------+------+------+------+------+------+------+------+------+------+------+
individual_fields
(IF) is generally fastest working method. Its complexity grows in proportion to the number of columns. Unfortunately combined_fields
(CF) does not work for object arrays. Otherwise, it would be not only the fastest method, but also one that does not gain complexity with increasing columns.
All the techniques I thought would be faster are not, because mapping python objects to keys is slow (actual lookup of packed int arrays, for example, is much much faster than structured arrays).
References
Here are the additional questions I had to ask to get this code working at all:
Upvotes: 3