M.E.
M.E.

Reputation: 5496

How do I combine two numpy arrays so for each row of the first array I append all rows from the second one?

I have the following numpy arrays:

theta_array =
array([[ 1, 10],
       [ 1, 11],
       [ 1, 12],
       [ 1, 13],
       [ 1, 14],
       [ 2, 10],
       [ 2, 11],
       [ 2, 12],
       [ 2, 13],
       [ 2, 14],
       [ 3, 10],
       [ 3, 11],
       [ 3, 12],
       [ 3, 13],
       [ 3, 14],
       [ 4, 10],
       [ 4, 11],
       [ 4, 12],
       [ 4, 13],
       [ 4, 14]])

XY_array  = 
array([[ 44.0394952 , 505.81099922],
       [ 61.03882938, 515.97253226],
       [ 26.69851841, 525.18083012],
       [ 46.78487831, 533.42309602],
       [ 45.77188401, 545.42988355],
       [ 81.12969132, 554.78767379],
       [ 54.178463  , 565.8716283 ],
       [ 41.58952084, 574.76827133],
       [ 85.24956815, 585.1355127 ],
       [ 80.73726733, 595.49446033],
       [ 22.70625059, 605.59017175],
       [ 40.66810604, 615.26308629],
       [ 47.16694695, 624.39222332],
       [ 48.72499541, 633.19846364],
       [ 50.68589921, 643.72334885],
       [ 38.42731134, 654.68595883],
       [ 47.39519707, 666.28232866],
       [ 58.07767155, 673.9572227 ],
       [ 72.11393347, 683.68307373],
       [ 53.70872932, 694.65509894],
       [ 82.08237952, 704.5868817 ],
       [ 46.64069738, 715.18427515],
       [ 40.46032478, 723.91308011],
       [ 75.69090892, 733.69595658],
       [120.61447884, 745.31322786],
       [ 60.17764744, 754.89747186],
       [ 87.15961973, 766.24040447],
       [ 82.93872713, 773.01518252],
       [ 93.56688906, 785.60640153],
       [ 70.0474047 , 793.81792947],
       [104.3613818 , 805.40234676],
       [108.39253837, 814.75002114],
       [ 78.97643673, 824.95386427],
       [ 85.69096895, 834.44797862],
       [ 53.07112931, 844.39555058],
       [111.49875807, 855.660508  ],
       [ 70.88824958, 865.53417489],
       [ 79.55499469, 875.31303945],
       [ 60.86941464, 885.85235946],
       [101.06017712, 896.69986636],
       [ 74.55823544, 905.87417231],
       [113.24705653, 915.19350121],
       [ 94.21920882, 925.87933273],
       [ 63.26478103, 933.70804578],
       [ 95.97827181, 945.76196917],
       [ 80.48623318, 955.60422694],
       [ 80.03451808, 964.39856485],
       [ 73.86032436, 973.91032818],
       [103.96923524, 984.24366761],
       [ 93.20663129, 995.44618851]])

I am trying to combine both, so for each combination of theta_array I get all combinations from XY_array.

I am aware about this post so I have done this:

np.array(np.meshgrid(theta_array, XY_array)).T.reshape(-1,4)

But this generates:

array([[  1.        ,  44.0394952 ,   1.        , 505.81099922],
       [  1.        ,  61.03882938,   1.        , 515.97253226],
       [  1.        ,  26.69851841,   1.        , 525.18083012],
       ...,
       [ 14.        ,  73.86032436,  14.        , 973.91032818],
       [ 14.        , 103.96923524,  14.        , 984.24366761],
       [ 14.        ,  93.20663129,  14.        , 995.44618851]])

and the problem requires:

array([[  1.        ,   1.          ,  44.0394952 , 505.81099922],
       [  1.        ,   1.          ,  61.03882938, 515.97253226],
       [  1.        ,   1.          ,  26.69851841, 525.18083012],
       ...,
       [ 14.        ,   14.        ,  73.86032436,   973.91032818],
       [ 14.        ,   14.        , 103.96923524,   984.24366761],
       [ 14.        ,   14.        ,  93.20663129,   995.44618851]])

Which would be the way of doing this combination/aggregation in numpy?

EDIT:

There is a mistake in the above process as the combined arrays do not lead to the generation of that matrix. With separate vectors for each column the actual solution to merge this is:

dataset = np.array(np.meshgrid(theta0_range, theta1_range, X)).T.reshape(-1,3)

And later the Y vector can be added as an additional column.

Upvotes: 0

Views: 385

Answers (3)

M.E.
M.E.

Reputation: 5496

Just as side/complementary reference here is a comparison in terms of execution time for both solutions. For this specific operation itertools takes 10 times more time to complete than its numpy equivalent.

%%time

for i in range(1000):    
    z = np.array(np.meshgrid(theta_array, XY_array)).T.reshape(-1,4)[:,[0,2,1,3]]

CPU times: user 299 ms, sys: 0 ns, total: 299 ms
Wall time: 328 ms
%%time

for i in range(1000):    
    z = np.array([*product(theta_array, XY_array)])    
    z = z.reshape(z.shape[0],-1)

CPU times: user 2.79 s, sys: 474 µs, total: 2.79 s
Wall time: 2.84 s

Upvotes: 1

David Erickson
David Erickson

Reputation: 16683

You can reorder the "columns" after using meshgrid with [:,[0,2,1,3]] and if you need to make the list dynamic because of a large number of columns, then you can see the end of my answer:

np.array(np.meshgrid(theta_array, XY_array)).T.reshape(-1,4)[:,[0,2,1,3]]

Output:

array([[  1.        ,   1.        ,  44.0394952 , 505.81099922]],
       [[  1.        ,   1.        ,  61.03882938, 515.97253226]],
       [[  1.        ,   1.        ,  26.69851841, 525.18083012]],
       ...,
       [[ 14.        ,  14.        ,  73.86032436, 973.91032818]],
       [[ 14.        ,  14.        , 103.96923524, 984.24366761]],
       [[ 14.        ,  14.        ,  93.20663129, 995.44618851]])

If you have many columns you could dynamically create this list: [0,2,1,3] with list comprehension. For example:

n = new_arr.shape[1]*2
lst = [x for x in range(n) if x % 2 == 0]
[lst.append(z) for z in [y for y in range(n) if y % 2 == 1]]
lst

[0, 2, 4, 6, 1, 3, 5, 7]

Then, you could rewrite to:

np.array(np.meshgrid(theta_array, XY_array)).T.reshape(-1,4)[:,lst]

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150735

You can use itertools.product:

out = np.array([*product(theta_array, XY_array)])
out = out.reshape(out.shape[0],-1)

Output:

array([[  1.        ,  10.        ,  44.0394952 , 505.81099922],
       [  1.        ,  10.        ,  61.03882938, 515.97253226],
       [  1.        ,  10.        ,  26.69851841, 525.18083012],
       ...,
       [  4.        ,  14.        ,  73.86032436, 973.91032818],
       [  4.        ,  14.        , 103.96923524, 984.24366761],
       [  4.        ,  14.        ,  93.20663129, 995.44618851]])

That said, this looks very much like an XY-problem. What are you trying to do with this array?

Upvotes: 1

Related Questions