How do I check which rows of one small array exists in another larger one?

Question

Given the following setup:

final_batch = np.emtpy((batch_size,2))
batch_size = 4
a = np.array(range(10))
b = np.array(range(10,20))
edges = np.array([[0,11],[0,12],[1,11],[1,12],[0,17]])


c1 = np.random.choice(a,batch).reshape(-1,1)
c2 = np.random.choice(b,batch).reshape(-1,1)
samples = np.append(c1,c2,axis=1)

Now there can exist dubplicates in samples and edges, I want to keep making np.random.choice and only add them to final_batch IF they don't already exist in edges. The simple way to do this would be to just take them 1 by 1 in a loop

while len(final_batch)


But all of a,b and edges can be huge and batch size will be 10k, but as it's way faster to sample many elements at once I wanted to see if there is a faster way. Something like
while len(final_batch)

Note that c1 and c2 are mutually exclusive, so I feel like I should be able to use this somehow.

How do I check which rows of one small array exists in another larger one?

Answers (1)

Related Questions