Fast row deletion in numpy

Question

I am working with a big numpy matrix (approximately 75k rows of 2 integers each) from which I have to delete some rows. I would like to know if there is a fast way to delete a row without regenerating the whole array i.e. is there a function the change just the "mask" (or whatever is called) of the matrix, without effectively delete the row in memory? I could then regenerate a clean matrix after I delete all the proper rows.

hpaulj · Accepted Answer

The fast way to select rows from an array is with a slice, which produces a view. But that requires a regular pattern like 'every-nth' row. Any other select produces a copy.

x[::10,:]   # view
x[[1,3,6,10,20],:]   # copy
x[[True,False,False,True,False,...],:]   # copy

np.delete lets you specify which rows to remove, but it ends up, one or other, making a copy that contains the remaining rows. It's a complex function using different methods depending on what you specify. But in many cases it constructs a mask as @jakevdp demonstrates.

So the fastest way to delete a bunch of rows is to delete them (or select their complement) all at once. Deleting one at a time is the slow way.

Fast row deletion in numpy

Answers (2)

Related Questions