Killian Tallman
Killian Tallman

Reputation: 45

Concatenating/Appending Multiple Vertical Arrays of Different Sizes

I have a function that returns a numpy array. I loop this function with different data files but will end up with every loops giving out a different sized array (which is the desired output) but I cannot figure out how to properly append these arrays. Example arrays and the method I use for arranging them after I grab the data from the file is shown:

a1 = np.array([1,2,3]) 
a2 = np.vstack(a1)
# array([[1],
   [2],
   [3]])
b1 = np.array([4,5,6,7])
b2 = np.vstack(b2)
# array([[4],
   [5],
   [6],
   [7]])

Simply I have these two arrays with one having 3 elements and one with 4. I want to arrange these vertically to look something like this for it to be exported:

1  4 
2  5
3  6
   7

I do not want zeros or Na to fill the gaps in the data as that would make more work.

This needs to work for vertical arrays with a column width of 2 to get output data to be organized like this:

1  2   5  6   10  11
2  3   6  7   11  12
3  4   7  8   12  13
       8  9 

So the first loop would produce this vertical 3,2 array while the second iteration of the loop would produce the 4,2 array where I would want to append or concatenate the 4,2 array to the original 3,2 array and so on. These sets of arrays will always be width of 2 but the lengths will change from each set of 2.

I have tried using the basic np.column_stack, np.concatenate, and np.append functions but they haven't worked. These can be lists instead of numpy arrays if that works better or even organizing the outputted data in a dataframe would be fine.

======= Update =======

To be more specific and after trying some of the solutions provided here are some more details on my issue. My function gets data from a data file (works fine) which returns 2 lists or arrays (which ever) of values that are the same dimensions (no issue here either).

Now I am trying to do this while looping over all of the files in a directory and I want to append/concatenate these two lists (or arrays) for each file together but they could be different sizes. The trouble arises when I try to put them together vertically to yield columns of the output data. Also I need to do a simple mathematical operation on the values within the loop so I think they might need to be numpy arrays (or something similar) and not a list.

Loop #1 returns:

outdata1 = [0.0012, 0.0013, 0.00124, 0.00127] outdata2 = [0.0016, 0.0014, 0.00134, 0.0013]

Loop #2 returns:

outdata1 = [0.00155, 0.00174, 0.0018] outdata2 = [0.0019, 0.0020, 0.0021]

and so on...

Now I need to do math on these and spit them out into vertically organized column data without cutting off any data. This can be done with putting Na in space or with a data frame if that would work and I could correct those spaces before export. I would like it to look like this:

0.0012 0.0016 0.00155 0.0019 0.0013 0.0014 0.00174 0.0020 0.00124 0.00134 0.0018 0.0021 0.00127 0.0013

Upvotes: 4

Views: 2566

Answers (2)

hpaulj
hpaulj

Reputation: 231615

First, vstack on an array treats the array as a list on the first dimension. It then makes each 'row/element' into a 2d array, and concatenates them.

These all do the same thing:

In [94]: np.vstack(np.array([1,2,3]))                                           
Out[94]: 
array([[1],
       [2],
       [3]])
In [95]: np.vstack([[1],[2],[3]])                                               
Out[95]: 
array([[1],
       [2],
       [3]])
In [96]: np.concatenate(([[1]],[[2]],[[3]]), axis=0)                            
Out[96]: 
array([[1],
       [2],
       [3]])

Matching arrays or lists can be 'column_stack` - the arrays are turned into (n,1) arrays, and then joined on the 2nd dimension:

In [97]: np.column_stack(([1,2,3], [4,5,6]))                                    
Out[97]: 
array([[1, 4],
       [2, 5],
       [3, 6]])

But the ragged arrays don't work.

An array of lists/arrays of differing size has object dtype, and is, for many purposes like a list of lists:

In [98]: np.array(([1,2,3],[4,5,6,7]))                                          
Out[98]: array([list([1, 2, 3]), list([4, 5, 6, 7])], dtype=object)

Your last structure could written as a ragged list of lists:

In [100]: [[1,2,5,6,10,11],[2,3,6,7,11,12],[3,4,7,8,12,13],[8,9]]               
Out[100]: [[1, 2, 5, 6, 10, 11], [2, 3, 6, 7, 11, 12], [3, 4, 7, 8, 12, 13], [8, 9]]
In [101]: np.array(_)                                                           
Out[101]: 
array([list([1, 2, 5, 6, 10, 11]), list([2, 3, 6, 7, 11, 12]),
       list([3, 4, 7, 8, 12, 13]), list([8, 9])], dtype=object)

Notice though this doesn't line up the [8,9] with the others. You need some sort of filler/spacer. The Python list zip_longest provides that:

In [102]: from itertools import zip_longest                                     
In [103]: alist = [[1,2,3],[2,3,4],[5,6,7,8],[11,12,13]]                        
In [104]: list(zip_longest(*alist))                                             
Out[104]: [(1, 2, 5, 11), (2, 3, 6, 12), (3, 4, 7, 13), (None, None, 8, None)]

With this padding we can make a 2d array (object dtype because of the None):

In [105]: np.array(_)                                                           
Out[105]: 
array([[1, 2, 5, 11],
       [2, 3, 6, 12],
       [3, 4, 7, 13],
       [None, None, 8, None]], dtype=object)

===

I can generate the numbers in your last display with a little function:

In [232]: def foo(i,n): 
     ...:     return np.column_stack((np.arange(i,i+n), np.arange(i+1,i+1+n))) 
     ...:                                                                       
In [233]: foo(1,3)                                                              
Out[233]: 
array([[1, 2],
       [2, 3],
       [3, 4]])
In [234]: foo(5,4)                                                              
Out[234]: 
array([[5, 6],
       [6, 7],
       [7, 8],
       [8, 9]])
In [235]: foo(10,3)                                                             
Out[235]: 
array([[10, 11],
       [11, 12],
       [12, 13]])

I can put all those arrays in a list:

In [236]: [Out[233], Out[234], Out[235]]                                        
Out[236]: 
[array([[1, 2],
        [2, 3],
        [3, 4]]), array([[5, 6],
        [6, 7],
        [7, 8],
        [8, 9]]), array([[10, 11],
        [11, 12],
        [12, 13]])]

I can turn that list into an object dtype array:

In [237]: np.array([Out[233], Out[234], Out[235]])                              
Out[237]: 
array([array([[1, 2],
       [2, 3],
       [3, 4]]),
       array([[5, 6],
       [6, 7],
       [7, 8],
       [8, 9]]),
       array([[10, 11],
       [11, 12],
       [12, 13]])], dtype=object)

I could also display several rows of these arrays with:

In [238]: for i in range(3): 
     ...:     print(np.hstack([a[i,:] for a in Out[236]])) 
     ...:                                                                       
[ 1  2  5  6 10 11]
[ 2  3  6  7 11 12]
[ 3  4  7  8 12 13]

but to show the 4th row, which only exists for the middle array, I'd have to add more code to test whether we're off the end, and whether to add padding etc. I'll leave that exercise up to you, if it really matters. :)

Upvotes: 2

fountainhead
fountainhead

Reputation: 3722

Since you mentioned that lists are ok, why not use a list of such "vertical arrays"?:

my_list = []
while (not_done_yet):
    two_col_array = your_func (some_param) # your_func returns (x,2) array
    my_list.append(two_col_array)

my_list would now be a list of arrays of shape (x,2), where x could be different for different arrays in the list.

Upvotes: 0

Related Questions