Reputation: 8144
I have some functions, part of a big analysis software, that require a boolean mask to divide array items in two groups. These functions are like this:
def process(data, a_mask):
b_mask = -a_mask
res_a = func_a(data[a_mask])
res_b = func_b(data[b_mask])
return res_a, res_b
Now, I need to use these functions (with no modification) with a big array that has items of only class "a", but I would like to save RAM and do not pass a boolean mask with all True
. For example I could pass a slice like slice(None, None)
.
The problem is that the line b_mask = -a_mask
will fail if a_mask
is a slice. Ideally -a_mask
should give a 0-items selection.
I was thinking of creating a "modified" slice object that implements the __neg__()
method as a null slice (for example slice(0, 0)
). I don't know if this is possible.
Other solutions that allow to don't modify the process()
function but at the same time avoid allocating an all-True boolean array will be accepted as well.
Upvotes: 3
Views: 2281
Reputation: 19574
If you are concerned about memory use, then advanced indexing may be a bad idea. From the docs
Advanced indexing always returns a copy of the data (contrast with basic slicing that returns a view).
As it stands, the process
function has:
data
of size n
saya_mask
of size n
(assuming advanced indexing)And creates:
b_mask
of size n
data[a_mask]
of size m
saydata[b_mask]
of size n - m
This is effectively 4 arrays of size n
.
Basic slicing seems to be your best option then, however Python doesn't seem to allow subclassing slice
:
TypeError: Error when calling the metaclass bases
type 'slice' is not an acceptable base type
See @ali_m's answer for a solution that incorporates slicing.
Alternatively, you could just bypass process
and get your results as
result = func_a(data), func_b([])
Upvotes: 0
Reputation: 74262
Unfortunately we can't add a __neg__()
method to slice
, since it cannot be subclassed. However, tuple
can be subclassed, and we can use it to hold a single slice
object.
This leads me to a very, very nasty hack which should just about work for you:
class NegTuple(tuple):
def __neg__(self):
return slice(0)
We can create a NegTuple
containing a single slice object:
nt = NegTuple((slice(None),))
This can be used as an index, and negating it will yield an empty slice resulting in a 0-length array being indexed:
a = np.arange(5)
print a[nt]
# [0 1 2 3 4]
print a[-nt]
# []
You would have to be very desperate to resort to something like this, though. Is it totally out of the question to modify process
like this?
def process(data, a_mask=None):
if a_mask is None:
a_mask = slice(None) # every element
b_mask = slice(0) # no elements
else:
b_mask = -a_mask
res_a = func_a(data[a_mask])
res_b = func_b(data[b_mask])
return res_a, res_b
This is way more explicit, and should not have any affect on its behavior for your current use cases.
Upvotes: 2
Reputation: 13289
Your solution is very similar to a degenerate sparse boolean array, although I don't know of any implementations of the same. My knee-jerk reaction is one of dislike, but if you really can't modify process
it's probably the best way.
Upvotes: 0