Append numpy ndarrays with different dimensions in loop

Question

I need to append the arrays created in each loop so that I get a single ndarray at the end. The code structure is like this:

for...:
      .
      .
      .
     for...:
         list1 = array([some_math_here])
         list2.append(list1)

     #each loop creats a list, converting it to array() gives different shaped arrays:
     array(list2).shape
     (2939, 4)
     (2942, 4)
     (2027, 4)
     (2030, 4)

     #list3 collects all the generated results
     list3.append(list2)

Q: How can I have an array instead of list3, with n*4 columns and different number of rows?

I tried by creating an empty array a = array([0.,1.]) and then append(a,array(list_2)) but doesn't work. I'm aware of hstack vstack etc, but cannot make use of them together with append in the loop. Any advice how?

UPDATE Here's the actual code with output from suggested methods:

files_ = glob.glob('D:\Test files\*.txt')
tfile_ = loadtxt('times.txt')    
averages_, d = [], []

with open ('outfile.csv', 'wb') as outfile:
    writer = csv.writer(outfile)

    for fcount_, fname_ in enumerate(files_):   
        data = loadtxt(fname_ , usecols = (1,2,3,4))    
        average_, fcol = [], []
        seg_len = 3

        for x in range(0, len(data[:,0]), seg_len):
            sample_means = [mean(data[x:x+seg_len,i]) for i in range(4)]
            none_zeros = [x if x >= 0 else x == 0 for x in sample_means]            
            average_.append(none_zeros)

        fcol = cumsum(array(average_)[:,0])
        average_ = array([row + [col] for row, col in zip(average_, fcol)])
        averages_.append(average_)
    d = concatenate(array(averages_))    
    df = pd.DataFrame(d)
    df.to_csv('pdtest2.csv')

output:

           0         1         2         3         4
0   0.037039  0.103792  0.136116  0.579297  0.037039
1   0.051183  0.104669  0.177728  0.593771  0.088222
2   0.059517  0.105437  0.174274  0.571402  0.147739
3   0.053212  0.102476  0.167530  0.645745  0.200950
4   0.054637  0.104450  0.165228  0.596622  0.054637
5   0.051622  0.101161  0.166708  0.595964  0.106259
6   0.057324  0.099077  0.168024  0.596841  0.163583
7   0.054692  0.103573  0.157168  0.598596  0.218275
8   0.066699  0.100612  0.145984  0.591578  0.284974
9   0.120866  5.527104  4.678589  2.401020  0.120866
10  0.113958  5.176220  4.669872  2.361985  0.234824
11  0.121469  4.879613  4.659017  2.359573  0.356293
12  0.122511  4.695618  4.642240  2.363959  0.478803
13  0.126650  4.621933  4.620447  2.347073  0.605453
14  0.132708  4.676868  4.517568  2.364617  0.132708
15  0.125087  4.693535  4.459672  2.381941  0.257795
16  0.132708  4.715246  4.444705  2.334353  0.390503
17  0.133476  4.745619  4.406300  2.317467  0.523979

while I want :

    0           1           2           3           4           5           6           7           8           9           10          11          12          13          14          15          16          17          18          19
0   0.037038522 0.103792144 0.136115724 0.57929719  0.037038522 0.054637318 0.104450043 0.16522775  0.596621864 0.054637318 0.12086581  5.527104488 4.678589189 2.401020431 0.12086581  0.132707991 4.67686799  4.517567512 2.364616645 0.132707991
1   0.051183348 0.104669343 0.177727829 0.593770968 0.08822187  0.051621948 0.101160549 0.166708023 0.595963965 0.106259265 0.113957871 5.176219782 4.669871979 2.361985046 0.234823681 0.125087328 4.693534961 4.459672089 2.381941338 0.257795319
2   0.059516735 0.105436892 0.17427386  0.571402402 0.147738605 0.057323738 0.099077202 0.168023821 0.596841163 0.163583003 0.121468884 4.879613015 4.659016582 2.359572747 0.356292565 0.132707991 4.715245885 4.444704808 2.334353258 0.39050331
3   0.05321187  0.102476346 0.167530397 0.645744989 0.200950475 0.054692143 0.103572845 0.157168489 0.598595561 0.218275146 0.122510557 4.695618334 4.642240062 2.363958746 0.478803122 0.13347554  4.745619253 4.406299754 2.317467166 0.52397885
4   0           0           0           0           0           0.066698797 0.1006123   0.145984208 0.591577971 0.284973943 0.126649838 4.621932787 4.620447035 2.347072653 0.60545296  0           0            0          0           0

CT Zhu · Accepted Answer

No, you can't create a n*4 2d array if n for each column is different:

>>> np.vstack((np.arange(10),np.arange(1,11),np.arange(2,12)))
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10],
       [ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11]])
>>> np.vstack((np.arange(10),np.arange(0,11),np.arange(0,12)))

Traceback (most recent call last):
  File "", line 1, in 
    np.vstack((np.arange(10),np.arange(0,11),np.arange(0,12)))
  File "C:\Python27\lib\site-packages
umpy\core\shape_base.py", line 226, in vstack
    return _nx.concatenate(map(atleast_2d,tup),0)
ValueError: all the input array dimensions except for the concatenation axis must match exactly

See the ValueError, when the dimension of each array is different.

You either has to stay with list for list3 or fill each list2 to equal length.

For higher dimension, the same rule applies: np.vstack((np.ones((10,4)),np.ones((10,6)),np.ones((10,6)))) won't work, but np.vstack((np.ones((10,4)),np.ones((11,4)),np.ones((12,4)))) will and create a 35*4 array.

In your case, if you vstack your list2s, you will get get a 9938*4 array, if that is what you want. (I don't get the different number of rows part)

EDIT:

To pad the shorter arrays so that every array has the same shape, you need:np.lib.pad

>>> b=np.random.randint(0,20, size=(12,4))
>>> np.lib.pad(b, ((0,3),(0,0)), 'constant', constant_values=[0.])
array([[ 5,  2, 10,  7],
       [ 7, 17,  8, 11],
       [ 7,  7,  2, 10],
       [16, 17, 15, 16],
       [ 0, 19,  5,  6],
       [18, 19, 18,  6],
       [ 2,  8, 11, 19],
       [ 3, 17, 18, 16],
       [10,  1, 12, 11],
       [ 0,  7,  1, 14],
       [ 7, 17,  8, 16],
       [12,  6,  3,  5],
       [ 0,  0,  0,  0],
       [ 0,  0,  0,  0],
       [ 0,  0,  0,  0]])

((0,3),(0,0)) means to pad 3 elements in the end of the first axis and pad 0 elements in the beginning of it. Also, it means to pad nothing in the 2nd axis. In your case you need to ((0,max_length-length_of_current_array),(0,0)).

Then you just stack them all up using np.hstack.

But in my opinion you may want to pad nan instead of 0.. 0. may be meaningful data value.

Append numpy ndarrays with different dimensions in loop

Answers (2)

Related Questions