Reputation: 557
I am defining an array which should look like this
['word1', 2000, 21]
['word2', 2002, 33]
['word3', 1988, 51]
['word4', 1999, 26]
['word5', 2001, 72]
However when I append an a new entry I get a TypeError.
import numpy as np
npdtype = [('word', 'S35'), ('year', int), ('wordcount', int)]
np_array = np.empty((0,3), dtype=npdtype)
word = 'word1'
year = '2001'
word_count = '21'
np_array = np.append(np_array, [['word1', int(year), int(word_count)]], axis=0)
Traceback
File "/home/matt/.local/lib/python2.7/site-packages/numpy/lib/function_base.py", line 4586, in append
return concatenate((arr, values), axis=axis)
TypeError: invalid type promotion
What am I doing wrong?
Thanks
Upvotes: 1
Views: 2419
Reputation: 231385
append
is a way of calling np.concatenate
. Look at its code. Note it has to make sure the 2nd argument is an array. It does that without knowledge of your special dtype
. Try that. It probably produces a string dtype. Then it tries the concatenate. So you need to make an array with the correct dtype first.
I discourage the use of append
; it's better to use concatenate
directly so you have understand all details.
======================
Expanding on your answer:
In [75]: npdtype
Out[75]: [('word', 'S35'), ('year', numpy.int16), ('wordcount', numpy.int16)]
In [76]: column = np.array( [b'word1', np.int16(year), np.int16(word_count)], dtype=npdtype)
In [77]: column
Out[77]:
array([(b'word1', 0, 0),
(b'\xd1\x07', 0, 0),
(b'\x15', 0, 0)],
dtype=[('word', 'S35'), ('year', '<i2'), ('wordcount', '<i2')])
I don't think this is what you want.
The correct way to provide data for structured array record is with a tuple, or list of tuples (note the extra ()):
In [78]: column = np.array( [(b'word1', np.int16(year), np.int16(word_count))], dtype=npdtype)
In [79]: column
Out[79]:
array([(b'word1', 2001, 21)],
dtype=[('word', 'S35'), ('year', '<i2'), ('wordcount', '<i2')])
In [80]: column.shape
Out[80]: (1,)
Now I have a 1d, 1 element array with 3 fields.
Without the [], I get a single element 0d array
In [81]: column0 = np.array( (b'word1', np.int16(year), np.int16(word_count)), dtype=npdtype)
In [82]: column0.shape
Out[82]: ()
In [83]: column0
Out[83]:
array((b'word1', 2001, 21),
dtype=[('word', 'S35'), ('year', '<i2'), ('wordcount', '<i2')])
I can concatenate several of the 1d arrays:
In [85]: np.concatenate([column,column,column])
Out[85]:
array([(b'word1', 2001, 21),
(b'word1', 2001, 21),
(b'word1', 2001, 21)],
dtype=[('word', 'S35'), ('year', '<i2'), ('wordcount', '<i2')])
In [86]: _.shape
Out[86]: (3,)
In [87]: __['year'] # access the 2nd field (not column)
Out[87]: array([2001, 2001, 2001], dtype=int16)
Regarding the need for b
. You are using Py3 (as I am), and unicode is the default string type. So if you had used U35
in npdtype
, you could have left off the b
(bytestring flag).
That (0,3)
shape initial array is probably not what you want. 0 rows, 3 columns, but still has 3 dtype fields. Look at a (1,3)
version
In [88]: np.empty((1,3),dtype=npdtype)
Out[88]:
array([[(b'', 0, 0), (b'', 0, 0), (b'', 0, 0)]],
dtype=[('word', 'S35'), ('year', '<i2'), ('wordcount', '<i2')])
This has blanks and 0 because of what happens to be in the memory. They could have been random characters/numbers.
numpy
lets you make arrays with one or more 0 dimensions, but they usually aren't useful. About the only place they appear is as the starting point for an iterative array definition, e.g.
arr = np.empty((0,3))
for i in range(10):
arr = np.append(arr, [i,i+1,i+2])
which is better writen as
ll = []
for i in range(10):
ll.append([i,i+1,i+2])
arr = np.array(ll)
or
arr = np.empty((10,3))
for i in range(10):
arr[i,:]=[i,i+1,i+2]
repeated array concatenate is slower.
Upvotes: 2
Reputation: 21643
Follow @hpaulj's advice and then tidy up.
import numpy as np
npdtype = [('word', 'S35'), ('year', np.int16), ('wordcount', np.int16)]
np_array = np.empty((0,3), dtype=npdtype)
word = 'word1'
year = '2001'
word_count = '21'
column = np.array( [b'word1', np.int16(year), np.int16(word_count)], dtype=npdtype)
print (column.shape)
column.shape=-1,3
print (column.shape)
print (column)
result=np.concatenate((np_array,column),axis=0)
print (result)
#~ np_array = np.append(np_array, [['word1', int(year), int(word_count)]], axis=0)
The two things that I found:
Here's the output.
>pythonw -u "temp.py"
(3,)
(1, 3)
[[(b'word1', 0, 0) (b'\xd1\x07', 0, 0) (b'\x15', 0, 0)]]
[[(b'word1', 0, 0) (b'\xd1\x07', 0, 0) (b'\x15', 0, 0)]]
>Exit code: 0
Upvotes: 0