Leviathan
Leviathan

Reputation: 191

Appending to Numpy array produces one big array rather than an array of arrays

I want to append arrays to an array in the following way:

np.append([[1, 2, 3], [4, 5, 6]], [[7, 8, 9]], axis=0)
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

Yet, when I don't write the arrays out, but try to do something like this

DataMatrix = np.array([])
dataArray = np.array([])
with open("fakedata.txt", "r") as file:
    for line in file.readlines():
        #f_list = [float(i) for i in line.split(" ") or i in line.split(", ") if i.strip()]
        rr = re.findall("[+-]?\d*[\.]?\d*(?:(?:[eE])[+-]?\d+)?", line)
        dataArray=np.array([])
        for numbers in rr:
            if(numbers!=""):
                dataArray=np.append(dataArray,float(numbers))
        DataMatrix=np.append(DataMatrix,dataArray, axis=0)
print(DataMatrix)

it just will not work. It will produce one big array, rather than an array of arrays. Putting extra []-brackets just about anywhere did not help. Every example I find, uses explicit arrays, as shown above, rather than variables.

Upvotes: 0

Views: 713

Answers (3)

Iguananaut
Iguananaut

Reputation: 23356

Assuming your file looks something like this:

1e1 1e2 -1e3
2.4e5 4.5e6 1.8e1
-1.1 -0.6 1.11

You can use np.loadtxt:

>>> import numpy as np
>>> import io
>>> matrix = """\
1e1 1e2 -1e3
2.4e5 4.5e6 1.8e1
-1.1 -0.6 1.11"""
>>> file = io.StringIO(matrix)
>>> np.loadtxt(file)
array([[ 1.00e+01,  1.00e+02, -1.00e+03],
       [ 2.40e+05,  4.50e+06,  1.80e+01],
       [-1.10e+00, -6.00e-01,  1.11e+00]])

In this case the default arguments to np.loadtxt will work, but if this isn't the exact format of your file there are various tweaks that can be made. To pass it a filename directly as in your case you can use np.loadtxt('fakedata.txt') instead.

Upvotes: 1

hpaulj
hpaulj

Reputation: 231605

Here's a modest tweak to your answer code. Without a txt file I can't test it, but I think it's right :)

alist=[]
with open("fakedata.txt", "r") as file:
    for line in file.readlines():
        rr = re.findall("[+-]?\d*[\.]?\d*(?:(?:[eE])[+-]?\d+)?", line)
        innerlist = [numbers in rr if numbers!=""]
        alist.append(innerlist)
np.array(alist, dtype=float)        

I replaced the for loop with a list comprehension; that's mainly a syntactic cleanup. And deferred the conversion to float, so np.array can do it on all strings 'at once'.

There have been several SO posts recently about list append versus array append. Nearly everyone agrees that list append like this is right way. Repeated array append/concatenate is inefficient, and hard to get right. np.concatenate with a list is quite useful; np.append should (IMO) be depricated.

Upvotes: 2

Leviathan
Leviathan

Reputation: 191

Alright, the only way that i manage, is to define a normal array (DataMatrix=[], rather than DataMatrix=np.array([])), and then use np.array(array) at the end to get it into the form i want:

DataMatrix=[]
with open("fakedata.txt", "r") as file:
    for line in file.readlines():
        rr = re.findall("[+-]?\d*[\.]?\d*(?:(?:[eE])[+-]?\d+)?", line)
        dataArray=[]
        for numbers in rr:
            if(numbers!=""):
                dataArray.append(float(numbers))
        DataMatrix.append(dataArray)
np.array(DataMatrix)        
print(np.array(DataMatrix))

Considering that I'm a total programming noob, this is probably not the smartest way to do so. But well...thanks for the downvote...

Upvotes: 0

Related Questions