PyLearner
PyLearner

Reputation: 239

Append numpy ndarrays with different dimensions in loop

I need to append the arrays created in each loop so that I get a single ndarray at the end. The code structure is like this:

for...:
      .
      .
      .
     for...:
         list1 = array([some_math_here])
         list2.append(list1)

     #each loop creats a list, converting it to array() gives different shaped arrays:
     array(list2).shape
     (2939, 4)
     (2942, 4)
     (2027, 4)
     (2030, 4)

     #list3 collects all the generated results
     list3.append(list2)

Q: How can I have an array instead of list3, with n*4 columns and different number of rows?

I tried by creating an empty array a = array([0.,1.]) and then append(a,array(list_2)) but doesn't work. I'm aware of hstack vstack etc, but cannot make use of them together with append in the loop. Any advice how?

UPDATE Here's the actual code with output from suggested methods:

files_ = glob.glob('D:\Test files\*.txt')
tfile_ = loadtxt('times.txt')    
averages_, d = [], []

with open ('outfile.csv', 'wb') as outfile:
    writer = csv.writer(outfile)

    for fcount_, fname_ in enumerate(files_):   
        data = loadtxt(fname_ , usecols = (1,2,3,4))    
        average_, fcol = [], []
        seg_len = 3

        for x in range(0, len(data[:,0]), seg_len):
            sample_means = [mean(data[x:x+seg_len,i]) for i in range(4)]
            none_zeros = [x if x >= 0 else x == 0 for x in sample_means]            
            average_.append(none_zeros)

        fcol = cumsum(array(average_)[:,0])
        average_ = array([row + [col] for row, col in zip(average_, fcol)])
        averages_.append(average_)
    d = concatenate(array(averages_))    
    df = pd.DataFrame(d)
    df.to_csv('pdtest2.csv')

output:

           0         1         2         3         4
0   0.037039  0.103792  0.136116  0.579297  0.037039
1   0.051183  0.104669  0.177728  0.593771  0.088222
2   0.059517  0.105437  0.174274  0.571402  0.147739
3   0.053212  0.102476  0.167530  0.645745  0.200950
4   0.054637  0.104450  0.165228  0.596622  0.054637
5   0.051622  0.101161  0.166708  0.595964  0.106259
6   0.057324  0.099077  0.168024  0.596841  0.163583
7   0.054692  0.103573  0.157168  0.598596  0.218275
8   0.066699  0.100612  0.145984  0.591578  0.284974
9   0.120866  5.527104  4.678589  2.401020  0.120866
10  0.113958  5.176220  4.669872  2.361985  0.234824
11  0.121469  4.879613  4.659017  2.359573  0.356293
12  0.122511  4.695618  4.642240  2.363959  0.478803
13  0.126650  4.621933  4.620447  2.347073  0.605453
14  0.132708  4.676868  4.517568  2.364617  0.132708
15  0.125087  4.693535  4.459672  2.381941  0.257795
16  0.132708  4.715246  4.444705  2.334353  0.390503
17  0.133476  4.745619  4.406300  2.317467  0.523979

while I want :

    0           1           2           3           4           5           6           7           8           9           10          11          12          13          14          15          16          17          18          19
0   0.037038522 0.103792144 0.136115724 0.57929719  0.037038522 0.054637318 0.104450043 0.16522775  0.596621864 0.054637318 0.12086581  5.527104488 4.678589189 2.401020431 0.12086581  0.132707991 4.67686799  4.517567512 2.364616645 0.132707991
1   0.051183348 0.104669343 0.177727829 0.593770968 0.08822187  0.051621948 0.101160549 0.166708023 0.595963965 0.106259265 0.113957871 5.176219782 4.669871979 2.361985046 0.234823681 0.125087328 4.693534961 4.459672089 2.381941338 0.257795319
2   0.059516735 0.105436892 0.17427386  0.571402402 0.147738605 0.057323738 0.099077202 0.168023821 0.596841163 0.163583003 0.121468884 4.879613015 4.659016582 2.359572747 0.356292565 0.132707991 4.715245885 4.444704808 2.334353258 0.39050331
3   0.05321187  0.102476346 0.167530397 0.645744989 0.200950475 0.054692143 0.103572845 0.157168489 0.598595561 0.218275146 0.122510557 4.695618334 4.642240062 2.363958746 0.478803122 0.13347554  4.745619253 4.406299754 2.317467166 0.52397885
4   0           0           0           0           0           0.066698797 0.1006123   0.145984208 0.591577971 0.284973943 0.126649838 4.621932787 4.620447035 2.347072653 0.60545296  0           0            0          0           0

Upvotes: 3

Views: 7474

Answers (2)

CT Zhu
CT Zhu

Reputation: 54340

No, you can't create a n*4 2d array if n for each column is different:

>>> np.vstack((np.arange(10),np.arange(1,11),np.arange(2,12)))
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10],
       [ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11]])
>>> np.vstack((np.arange(10),np.arange(0,11),np.arange(0,12)))

Traceback (most recent call last):
  File "<pyshell#36>", line 1, in <module>
    np.vstack((np.arange(10),np.arange(0,11),np.arange(0,12)))
  File "C:\Python27\lib\site-packages\numpy\core\shape_base.py", line 226, in vstack
    return _nx.concatenate(map(atleast_2d,tup),0)
ValueError: all the input array dimensions except for the concatenation axis must match exactly

See the ValueError, when the dimension of each array is different.

You either has to stay with list for list3 or fill each list2 to equal length.

For higher dimension, the same rule applies: np.vstack((np.ones((10,4)),np.ones((10,6)),np.ones((10,6)))) won't work, but np.vstack((np.ones((10,4)),np.ones((11,4)),np.ones((12,4)))) will and create a 35*4 array.

In your case, if you vstack your list2s, you will get get a 9938*4 array, if that is what you want. (I don't get the different number of rows part)

EDIT:

To pad the shorter arrays so that every array has the same shape, you need:np.lib.pad

>>> b=np.random.randint(0,20, size=(12,4))
>>> np.lib.pad(b, ((0,3),(0,0)), 'constant', constant_values=[0.])
array([[ 5,  2, 10,  7],
       [ 7, 17,  8, 11],
       [ 7,  7,  2, 10],
       [16, 17, 15, 16],
       [ 0, 19,  5,  6],
       [18, 19, 18,  6],
       [ 2,  8, 11, 19],
       [ 3, 17, 18, 16],
       [10,  1, 12, 11],
       [ 0,  7,  1, 14],
       [ 7, 17,  8, 16],
       [12,  6,  3,  5],
       [ 0,  0,  0,  0],
       [ 0,  0,  0,  0],
       [ 0,  0,  0,  0]])

((0,3),(0,0)) means to pad 3 elements in the end of the first axis and pad 0 elements in the beginning of it. Also, it means to pad nothing in the 2nd axis. In your case you need to ((0,max_length-length_of_current_array),(0,0)).

Then you just stack them all up using np.hstack.

But in my opinion you may want to pad nan instead of 0.. 0. may be meaningful data value.

Upvotes: 3

John Zwinck
John Zwinck

Reputation: 249153

I think what you want is to make a plain Python list of NumPy arrays. Each array will have the same column count and types, but perhaps different numbers of rows. So you should start with simply list3 = [] and then do list3.append(arr) N times. Then at the end, if you want one big NumPy array with all the rows, just do np.concatenate(list3) to put them all together at once. This is more efficient than trying to concatenate NumPy arrays in the loop.

Upvotes: 1

Related Questions