Reputation: 77
I would like to first combine three arrays of equal shape element-wise and then join duplicated entries of a smaller array to this. I am able to do this, but my method has many steps and I would like to know if there is a more direct (and faster?) method. Now seeing the regularity in the target, it seems clear some list comprehensions could sort it out. I would however like to keep readability so another person could follow it, but efficient enough to scale to a couple million rows, hence NumPy operations we my first thought.
Take as given three 3D arrays of the same shapes a x b x c, as well as a shallower array 'm' of the same length 'a'. We need to combine a, b and c, remove the 2nd and 3rd columns of m, then broadcast m across the others once they are combined and return the result as a 2D array.
Here is my method. The below generic arrays will be these given inputs -- the numbers are just for labelling, so that the target result will be clearer.
>>> import numpy as np
>>> a = 2
>>> b = 3
>>> c = 4
>>> n = a * b * c
>>> x = np.arange(n).reshape(a,b,c)
>>> y = x + n
>>> z = y + n
x = [[[ 0. 1. 2. 3.]
[ 4. 5. 6. 7.]
[ 8. 9. 10. 11.]]
[[12. 13. 14. 15.]
[16. 17. 18. 19.]
[20. 21. 22. 23.]]]
>>> m = 3 * n + np.linspace(0, 13 * a - 1, 13 * a).reshape(a, 13)
m = [[72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84.]
[85. 86. 87. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97.]]
Combine a, b and c so that the corresponding entries of each become a row.
>>> triples = np.stack((x, y, z), axis=3)
These particular entries are not needed.
>>> cleared = np.delete(m, np.s_[1:3], axis = 1)
'Manually broadcast' bringing the dimension up to the correct shape. I assume it is possible to do this in one go but I am not sure how.
>>> sizeup1 = np.linspace(cleared, cleared, c, axis=1)
>>> sizeup2 = np.linspace(sizeup1, sizeup1, b, axis=1)
Concatenate each of the corresponding innermost rows.
>>> pre_res = np.concatenate((sizeup2, triples), axis=3)
The final result is a 2D array, and looks like this.
>>> result = pre_res.reshape(a * b * c, -1)
result = [[72. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 0. 24. 48.]
[72. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 1. 25. 49.]
[72. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 2. 26. 50.]
[72. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 3. 27. 51.]
[72. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 4. 28. 52.]
[72. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 5. 29. 53.]
[72. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 6. 30. 54.]
[72. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 7. 31. 55.]
[72. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 8. 32. 56.]
[72. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 9. 33. 57.]
[72. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 10. 34. 58.]
[72. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 11. 35. 59.]
[85. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 12. 36. 60.]
[85. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 13. 37. 61.]
[85. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 14. 38. 62.]
[85. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 15. 39. 63.]
[85. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 16. 40. 64.]
[85. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 17. 41. 65.]
[85. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 18. 42. 66.]
[85. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 19. 43. 67.]
[85. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 20. 44. 68.]
[85. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 21. 45. 69.]
[85. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 22. 46. 70.]
[85. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 23. 47. 71.]]
The regularity in the final result may aid in finding a more direct result, but the code should still be follow-able without having seen the target first. I expect anyone familiar with NumPy will easily identify oversights in my approach, is it even a one or two-liner maybe?
UPDATE: I have benchmarked the suggestions by @hpaulj and will share in case anyone in the future finds this useful. For small arrays, they are almost identical. For any array size, the difference is negligible between
sizeup_opt1 = np.tile(cleared[:,None,None,:],(1,b,c,1))
and
sizeup_opt2 = np.repeat(cleared[:,None,:],b*c,1).reshape(a,b,c,11)
Then the other case is for large arrays, when
triples = np.stack((x, y, z), axis=3)
np.concatenate((sizeup_opt2, triples), axis=3).reshape(a * b * c, -1)
takes half the time of
np.concatenate([sizeup_opt2, x[...,None], y[...,None], z[...,None]], axis=3).reshape(a * b * c, -1)
for very large arrays.
Upvotes: 2
Views: 193
Reputation: 231385
Some alternatives to the sizeup
linspace:
In [113]: sizeup1.shape
Out[113]: (2, 4, 11)
In [114]: sizeup2.shape
Out[114]: (2, 3, 4, 11)
In [115]: cleared.shape
Out[115]: (2, 11)
In [116]: np.tile(cleared[:,None,None,:],(1,3,4,1)).shape
Out[116]: (2, 3, 4, 11)
In [117]: np.allclose(np.tile(cleared[:,None,None,:],(1,3,4,1)),sizeup2)
Out[117]: True
In [118]: np.allclose(np.repeat(cleared[:,None,:],3*4,1).reshape(2,3,4,11),sizeup2)
I suspect the one repeat case will be fastest, but I haven't timed these.
In [120]: x.shape
Out[120]: (2, 3, 4)
In [121]: triples = np.stack((x, y, z), axis=3)
In [122]: triples.shape
Out[122]: (2, 3, 4, 3)
In [123]: pre_res = np.concatenate((sizeup2, triples), axis=3)
In [124]: pre_res.shape
Out[124]: (2, 3, 4, 14)
So the last concatenate
joins a (2,3,4,11) with (2,3,4,3)
stack
effectively does:
np.concatenate([x[...,None], y[...,None], z[...,None]], axis=3)
that is it adds a trailing dimension to each and joins them
So pre_res
could be:
np.concatenate([sizeup2, x[...,None], ....], axis=3)
but I'm not sure it's much of an improvement.
You could add the trailing dimension to x
earlier on:
x = np.linspace(0, n-1, n).reshape(a, b, c, 1)
Upvotes: 2