Reputation: 5905
Given 2 arrays: One for a master dataset, and the second as list of grouped indices that reference the master dataset. I'm looking for the fastest to generate new arrays from the given index data?
Here's my current solution for generating 2 arrays from a list of double keys:
# Lets make a large point cloud with 1 million entries and a list of random paired indices
import numpy as np
COUNT = 1000000
POINT_CLOUD = np.random.rand(COUNT,3) * 100
INDICES = (np.random.rand(COUNT,2)*COUNT).astype(int) # (1,10),(233,12),...
# Split into sublists, np.squeeze is needed here because i don't want arrays of single elements.
LIST1 = POINT_CLOUD[np.squeeze(INDICES[:,[0]])]
LIST2 = POINT_CLOUD[np.squeeze(INDICES[:,[1]])]
This works, but it's a little slow, and it's only good for generating 2 lists, it would be great to have a solution that could tackle any size of index groups (ex: ((1,2,3,4),(8,4,5,3),...)
so something like:
# PSEUDO CODE using quadruple keys
INDICES = (np.random.rand(COUNT,4)*COUNT).astype(int)
SPLIT = POINT_CLOUD[<some pythonic magic>[INDICES]]
SPLIT[0] = np.array([points from INDEX #1])
SPLIT[1] = np.array([points from INDEX #2])
SPLIT[2] = np.array([points from INDEX #3])
SPLIT[3] = np.array([points from INDEX #4])
Upvotes: 0
Views: 447
Reputation: 32521
You just have to reshape the index array:
>>> result = POINT_CLOUD[INDICES.T]
>>> np.allclose(result[0], LIST1)
True
>>> np.allclose(result[1], LIST2)
True
If you know the number of sub-arrays you can also unpack the list
>>> result.shape
(2, 1000000, 3)
>>> L1, L2 = result
>>> np.allclose(L1, LIST1)
True
>>> # etc
This works for larger index groups. For the second example in your question:
>>> INDICES = (np.random.rand(COUNT,4)*COUNT).astype(int)
>>> SPLIT = POINT_CLOUD[INDICES.T]
>>> SPLIT.shape
(4, 1000000, 3)
>>>
Upvotes: 1