Reputation: 125
I'm trying to construct a numpy array, and then append integers and another array to it. I tried doing this:
xyz_list = frag_str.split()
nums = numpy.array([])
coords = numpy.array([])
for i in range(int(len(xyz_list)/4)):
numpy.append(nums, xyz_list[i*4])
numpy.append(coords, xyz_list[i*4+1:(i+1)*4])
print(atoms)
print(coords)
Printing out the output only gives my empty arrays. Why is that?
In addition, how can I rewrite coords
in a way that allows me to have 2D arrays like this: array[[0,0,0],[0,0,1],[0,0,-1]]
?
Upvotes: 2
Views: 13448
Reputation: 164613
numpy.append
, unlike python's list.append
, does not perform operations in place. Therefore, you need to assign the result back to a variable, as below.
import numpy
xyz_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
nums = numpy.array([])
coords = numpy.array([])
for i in range(int(len(xyz_list)/4)):
nums = numpy.append(nums, xyz_list[i*4])
coords = numpy.append(coords, xyz_list[i*4+1:(i+1)*4])
print(nums) # [ 1. 5. 9.]
print(coords) # [ 2. 3. 4. 6. 7. 8. 10. 11. 12.]
You can reshape coords
as follows:
coords = coords.reshape(3, 3)
# array([[ 2., 3., 4.],
# [ 6., 7., 8.],
# [ 10., 11., 12.]])
More details on numpy.append
behaviour
Returns: A copy of arr with values appended to axis. Note that append does not occur in-place: a new array is allocated and filled.
If you know the shape of your numpy
array output beforehand, it is efficient to instantiate via np.zeros(n)
and fill it with results later.
Another option: if your calculations make heavy use of inserting elements to the left of an array, consider using collections.deque
from the standard library.
Upvotes: 4
Reputation: 231325
np.append
is not a list clone. It is a clumsy wrapper to np.concatenate
. It is better to learn to use that correctly.
xyz_list = frag_str.split()
nums = []
coords = []
for i in range(int(len(xyz_list)/4)):
nums.append(xyz_list[i*4])
coords.append(xyz_list[i*4+1:(i+1)*4])
nums = np.concatenate(nums)
coords = np.concatenate(coords)
List append is faster, and easier to initialize. np.concatenate
works fine with a list of arrays. np.append
uses concatenate
, but only accepts two inputs. np.array
is needed if the list contains numbers or strings.
You don't give an example of frag_str
. But the name and the use of split
suggests it is a string. I don't think anything else has a split
method.
In [74]: alist = 'one two three four five six seven eight'.split()
That's a list of strings. Using your indexing I can construct 2 lists:
In [76]: [alist[i*4] for i in range(2)]
Out[76]: ['one', 'five']
In [77]: [alist[i*4+1:(i+1)*4] for i in range(2)]
Out[77]: [['two', 'three', 'four'], ['six', 'seven', 'eight']]
And I can make arrays from each of those lists:
In [78]: np.array(Out[76])
Out[78]: array(['one', 'five'], dtype='<U4')
In [79]: np.array(Out[77])
Out[79]:
array([['two', 'three', 'four'],
['six', 'seven', 'eight']], dtype='<U5')
In the first case the array is 1d, in the second, 2d.
It the string contains digits, we can make an integer array by specifying dtype
.
In [80]: alist = '1 2 3 4 5 6 7 8'.split()
In [81]: np.array([alist[i*4] for i in range(2)])
Out[81]: array(['1', '5'], dtype='<U1')
In [82]: np.array([alist[i*4] for i in range(2)], dtype=int)
Out[82]: array([1, 5])
Upvotes: 3
Reputation: 14584
As stated above, numpy.append
does not append items in place, but the reason why is important. You must store the returned array from numpy.append
to the original variable, or else your code will not work. That being said, you should likely rethink your logic.
Numpy uses C-style arrays internally, which are arrays in contiguous memory without leading or trailing unused elements. In order to append an item to an array, Numpy must allocate a buffer of the array size + 1, copy all the data over, and add the appended element.
In pseudo-C code, this comes to the following:
int* numpy_append(int* arr, size_t size, int element)
{
int* new_arr = malloc(sizeof(int) * (size+1);
mempcy(new_arr, arr, sizeof(int) * size);
new_arr[size] = element;
return new_arr;
}
This is extremely inefficient, since a new array must be allocated each time (memory allocation is slow), all the elements must be copied over, and the new element added to the end of the new array.
In comparison, Python lists reserve extra elements beyond the size of the container, until the size is the same as the capacity of the list, and grow exponentially. This is much more efficient for insertions at the end of the container than reallocating the entire buffer each time.
You should use Python lists and list.append
, and then convert the new list to a NumPy array. Or, if performance is truly critical, use a C++-extension using std::vector
rather than numpy.append
in all scenarios. Re-write your code, or it will be glacial.
Edit
Also,as pointed out in the comments, if you know the size of a Numpy array before hand, pre-allocating it with np.zeros(n)
is efficient, as is using a custom wrapper around a NumPy array
class extendable_array:
def __init__(self, size=0, dtype=np.int):
self.arr = np.array(dtype=dtype)
self.size = size
def grow(self):
'''Double the array'''
arr = self.arr
self.arr = np.zeros(min(arr.size * 2, 1), dtype=arr.dtype)
self.arr[:arr.size] = arr
def append(self, value):
'''Append a value to the array'''
if self.arr.size == self.size:
self.grow()
self.arr[self.size] = value
self.size += 1.
# add more methods here
Upvotes: 2