hintze
hintze

Reputation: 630

ValueError could not broadcast where mask from shape (1) into shape ()

There are a lot of questions related to ValueErrors where input arrays of different shape cannot be broadcast here at SO. But there are none are related to masks:

ValueError could not broadcast where mask from shape (1) into shape ()

What I'm doing is the following:

The reason why is irrelevant for this question, but may be found in this question.

Minimal code:

with netCDF4.Dataset(inputFile) as src, \
            netCDF4.Dataset(outputFile, "w") as dst:
    
    # copy global attributes all at once via dictionary
    dst.setncatts(ingroup.__dict__)

    for variable in src.variables.values():
        # create dimensions first
        for dimName in variable.dimensions:
            dst.createDimension(
                        dimName,
                        (len(dimension)
                            if not dimension.isunlimited() else None))

        # now create variable
        newVar = outgroup.createVariable(
                variable.name,
                variable.datatype,
                variable.dimensions)

        # copy variable attributes all at once via dictionary
        newVar.setncatts(variable.__dict__)

        # copy content
        newVar[:] = variable[:]

This works on newer Pythons (tested with >= 3.6) for all variables, but does not with Python 2.7 for scalar NetCDF variables that are not filled. Within the debugger, right when this exception is raised, the variable of interest looks like (in both Python 2.7 and 3.6):

>>> variable.shape
()
>>> type(variable[:])
<class 'numpy.ma.core.MaskedConstant'>
>>> variable[:].mask
array(True, dtype=bool)
>>> variable[:]
masked
>>> print(variable[:])
--

So this only happens on empty scalar NetCDF variables. Assigning a masked constant to another masked constant on the other hand works. It's just that masked constant inside the netCDF4._netCDF4.Variable .Why? And how to fix?

edit: failure occurs with numpy 1.7.1 and netcdf4 1.2.7. <- this turned out to be the source of the issue, see this answer.


variable is of type netCDF4._netCDF4.Variable, and print(dir(variable)) shows ['__array__', '__class__', '__delattr__', '__delitem__', '__doc__', '__format__', '__getattr__', '__getattribute__', '__getitem__', '__hash__', '__init__', '__len__', '__new__', '__orthogonal_indexing__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', '__unicode__', '_assign_vlen', '_cmptype', '_enumtype', '_get', '_getdims', '_getname', '_grp', '_grpid', '_has_lsd', '_iscompound', '_isenum', '_isprimitive', '_isvlen', '_name', '_nunlimdim', '_put', '_toma', '_varid', '_vltype', 'assignValue', 'chunking', 'datatype', 'delncattr', 'dimensions', 'dtype', 'endian', 'filters', 'getValue', 'get_var_chunk_cache', 'getncattr', 'group', u'long_name', 'mask', 'name', 'ncattrs', 'ndim', 'renameAttribute', 'scale', 'set_auto_mask', 'set_auto_maskandscale', 'set_auto_scale', 'set_var_chunk_cache', 'setncattr', 'setncattr_string', 'setncatts', 'shape', 'size']


Yes I know Python 2 is EOL. But it is needed for [insert reason from legacy dev environment].

Upvotes: 0

Views: 349

Answers (2)

hintze
hintze

Reputation: 630

As surfaced through @norok2's answer, neither using getValue() nor slicing with Ellipsis on scalar variables works in this case. Both raises (unexpectedly) the same ValueError on scalar NetCDF variables with Python 2.7. Nevertheless, based on the answer, the following is derived, which fixes the problem by not setting the newly created scalar NetCDF variable.

if isinstance(variable[:], numpy.ma.core.MaskedConstant):
    if variable[:].mask[()]:
        continue
newVar[:] = variable[:]

The result is correct, since the original variable is not copied to newVar only if it is a) scalar and b) masked (unset). Not copying means that newVar remains unset. Which is the same as if it would have been copied.


This issue appears to be version specific. With numpy 1.10.0 up until numpy 1.12.1, the raised exception changes to

IndexError: too many indices for array

Since numpy 1.13.0, this is working perfectly.

This GitHub issue seems to be linked.

Upvotes: 1

norok2
norok2

Reputation: 26916

If the documentation is referring to your same version, when the shape is an empty tuple, you could be using either netCDF4.Variable.getValue() / netCDF4.Variable.assignValue() combination, e.g.:

if variable.shape:
    newVar[:] = variable[:]
else:
    newVar.assignValue(variable.getValue())

or newVar[...] = variable[...], e.g.:

slicing = slice(None) if variable.shape else Ellipsis
newVar[slicing] = variable[slicing]

Upvotes: 2

Related Questions