Reputation: 105067
I have the following code:
r = numpy.zeros(shape = (width, height, 9))
It creates a width x height x 9
matrix filled with zeros. Instead, I'd like to know if there's a function or way to initialize them instead to NaN
s in an easy way.
Upvotes: 304
Views: 452886
Reputation: 11271
Yet another possibility not yet mentioned here is to use NumPy tile:
a = numpy.tile(numpy.nan, (3, 3))
Also gives
array([[ NaN, NaN, NaN],
[ NaN, NaN, NaN],
[ NaN, NaN, NaN]])
update:
I did a speed comparison, and it's not very fast :/ It's slower than the ones_times
by a decimal.
Upvotes: 10
Reputation: 320
Pardon my tardiness, but here is the fastest solution for large arrays, iff single-precision (f4 float32) is all you need. And yes, np.nan
works as well.
def full_single_prec(n):
return numpy.full(n, val, dtype='f4')
Upvotes: 3
Reputation: 49
>>> width = 2
>>> height = 3
>>> r = np.full((width, height, 9), np.nan)
>>> print(r)
array([[[nan, nan, nan, nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan, nan, nan, nan]],
[[nan, nan, nan, nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan, nan, nan, nan]]])
>>> r.shape
(2, 3, 9)
Upvotes: 4
Reputation: 95
Just a warning that initializing with np.empty()
without subsequently editing the values can lead to (memory allocation?) problems:
arr1 = np.empty(96)
arr2 = np.empty(96)
print(arr1)
print(arr2)
# [nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 1. 1.
# 1. 1. 2. 2. 2. 2. nan nan nan nan nan nan nan nan 0. 0. 0. 0.
# 0. 0. 0. 0. nan nan nan nan nan nan nan nan nan nan nan nan nan nan
# nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
# nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
# nan nan nan nan nan nan]
#
# [nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 1. 1.
# 1. 1. 2. 2. 2. 2. nan nan nan nan nan nan nan nan nan nan nan nan
# nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
# nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
# nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
# nan nan nan nan nan nan]
The floats initialized in the array are used somewhere else in my script but are not associated with variables arr1
or arr2
at all. Spooky.
Answer from user @JHBonarius solved this problem:
arr = np.tile(np.nan, 96)
print(arr)
# [nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
# nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
# nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
# nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
# nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
# nan nan nan nan nan nan]
Upvotes: 4
Reputation: 58791
I compared the suggested alternatives for speed and found that, for large enough vectors/matrices to fill, all alternatives except val * ones
and array(n * [val])
are equally fast.
Code to reproduce the plot:
import numpy
import perfplot
val = 42.0
def fill(n):
a = numpy.empty(n)
a.fill(val)
return a
def colon(n):
a = numpy.empty(n)
a[:] = val
return a
def full(n):
return numpy.full(n, val)
def ones_times(n):
return val * numpy.ones(n)
def list(n):
return numpy.array(n * [val])
b = perfplot.bench(
setup=lambda n: n,
kernels=[fill, colon, full, ones_times, list],
n_range=[2 ** k for k in range(20)],
xlabel="len(a)",
)
b.save("out.png")
Upvotes: 88
Reputation: 49803
You rarely need loops for vector operations in numpy. You can create an uninitialized array and assign to all entries at once:
>>> a = numpy.empty((3,3,))
>>> a[:] = numpy.nan
>>> a
array([[ NaN, NaN, NaN],
[ NaN, NaN, NaN],
[ NaN, NaN, NaN]])
I have timed the alternatives a[:] = numpy.nan
here and a.fill(numpy.nan)
as posted by Blaenk:
$ python -mtimeit "import numpy as np; a = np.empty((100,100));" "a.fill(np.nan)"
10000 loops, best of 3: 54.3 usec per loop
$ python -mtimeit "import numpy as np; a = np.empty((100,100));" "a[:] = np.nan"
10000 loops, best of 3: 88.8 usec per loop
The timings show a preference for ndarray.fill(..)
as the faster alternative. OTOH, I like numpy's convenience implementation where you can assign values to whole slices at the time, the code's intention is very clear.
Note that ndarray.fill
performs its operation in-place, so numpy.empty((3,3,)).fill(numpy.nan)
will instead return None
.
Upvotes: 414
Reputation: 1297
Another alternative is numpy.broadcast_to(val,n)
which returns in constant time regardless of the size and is also the most memory efficient (it returns a view of the repeated element). The caveat is that the returned value is read-only.
Below is a comparison of the performances of all the other methods that have been proposed using the same benchmark as in Nico Schlömer's answer.
Upvotes: 8
Reputation: 3196
Another option is to use numpy.full
, an option available in NumPy 1.8+
a = np.full([height, width, 9], np.nan)
This is pretty flexible and you can fill it with any other number that you want.
Upvotes: 305
Reputation: 14075
As said, numpy.empty() is the way to go. However, for objects, fill() might not do exactly what you think it does:
In[36]: a = numpy.empty(5,dtype=object)
In[37]: a.fill([])
In[38]: a
Out[38]: array([[], [], [], [], []], dtype=object)
In[39]: a[0].append(4)
In[40]: a
Out[40]: array([[4], [4], [4], [4], [4]], dtype=object)
One way around can be e.g.:
In[41]: a = numpy.empty(5,dtype=object)
In[42]: a[:]= [ [] for x in range(5)]
In[43]: a[0].append(4)
In[44]: a
Out[44]: array([[4], [], [], [], []], dtype=object)
Upvotes: 7
Reputation: 19259
You can always use multiplication if you don't immediately recall the .empty
or .full
methods:
>>> np.nan * np.ones(shape=(3,2))
array([[ nan, nan],
[ nan, nan],
[ nan, nan]])
Of course it works with any other numerical value as well:
>>> 42 * np.ones(shape=(3,2))
array([[ 42, 42],
[ 42, 42],
[ 42, 42]])
But the @u0b34a0f6ae's accepted answer is 3x faster (CPU cycles, not brain cycles to remember numpy syntax ;):
$ python -mtimeit "import numpy as np; X = np.empty((100,100));" "X[:] = np.nan;"
100000 loops, best of 3: 8.9 usec per loop
(predict)laneh@predict:~/src/predict/predict/webapp$ master
$ python -mtimeit "import numpy as np; X = np.ones((100,100));" "X *= np.nan;"
10000 loops, best of 3: 24.9 usec per loop
Upvotes: 14
Reputation: 38586
Are you familiar with numpy.nan
?
You can create your own method such as:
def nans(shape, dtype=float):
a = numpy.empty(shape, dtype)
a.fill(numpy.nan)
return a
Then
nans([3,4])
would output
array([[ NaN, NaN, NaN, NaN],
[ NaN, NaN, NaN, NaN],
[ NaN, NaN, NaN, NaN]])
I found this code in a mailing list thread.
Upvotes: 28