Reputation: 1626
I want to randomly select rows from a numpy array. Say I have this array-
A = [[1, 3, 0],
[3, 2, 0],
[0, 2, 1],
[1, 1, 4],
[3, 2, 2],
[0, 1, 0],
[1, 3, 1],
[0, 4, 1],
[2, 4, 2],
[3, 3, 1]]
To randomly select say 6 rows, I am doing this:
B = A[np.random.choice(A.shape[0], size=6, replace=False), :]
I want another array C
which has the rows which were not selected in B.
Is there some in-built method to do this or do I need to do a brute-force, checking rows of B with rows of A?
Upvotes: 3
Views: 6682
Reputation: 15349
You can make any number of row-wise random partitions of A
by slicing a shuffled sequence of row indices:
ind = numpy.arange( A.shape[ 0 ] )
numpy.random.shuffle( ind )
B = A[ ind[ :6 ], : ]
C = A[ ind[ 6: ], : ]
If you don't want to change the order of the rows in each subset, you can sort each slice of the indices:
B = A[ sorted( ind[ :6 ] ), : ]
C = A[ sorted( ind[ 6: ] ), : ]
(Note that the solution provided by @MaxNoe also preserves row order.)
Upvotes: 3
Reputation: 14987
You can use boolean masks and draw random indices from an integer array which is as long as yours. The ~
is an elementwise not:
idx = np.arange(A.shape[0])
mask = np.zeros_like(idx, dtype=bool)
selected = np.random.choice(idx, 6, replace=False)
mask[selected] = True
B = A[mask]
C = A[~mask]
Upvotes: 1
Reputation: 85432
This gives you the indices for the selection:
sel = np.random.choice(A.shape[0], size=6, replace=False)
and this B
:
B = A[sel]
Get all not selected indices:
unsel = list(set(range(A.shape[0])) - set(sel))
and use them for C
:
C = A[unsel]
Instead of using set
and list
, you can use this:
unsel2 = np.setdiff1d(np.arange(A.shape[0]), sel)
For the example array the pure Python version:
%%timeit
unsel1 = list(set(range(A.shape[0])) - set(sel))
100000 loops, best of 3: 8.42 µs per loop
is faster than the NumPy version:
%%timeit
unsel2 = np.setdiff1d(np.arange(A.shape[0]), sel)
10000 loops, best of 3: 77.5 µs per loop
For larger A
the NumPy version is faster:
A = np.random.random((int(1e4), 3))
sel = np.random.choice(A.shape[0], size=6, replace=False)
%%timeit
unsel1 = list(set(range(A.shape[0])) - set(sel))
1000 loops, best of 3: 1.4 ms per loop
%%timeit
unsel2 = np.setdiff1d(np.arange(A.shape[0]), sel)
1000 loops, best of 3: 315 µs per loop
Upvotes: 1