Reputation: 1533
I wrote this code snippet in Python:
def remove_randomly(data, percentage):
test_list = []
np.random.shuffle(data)
for i in range(data.shape[0]):
for j in range(data.shape[1]):
roll = np.random.randint(low=1, high=100)
if roll > percentage:
test_list.append((i, j, data[i, j]))
data[i, j] = 0
It gets matrix data and a number percentage, iterates over the entire matrix, and zeros (100 - percentage) of the elements and saving them to another object called test_list.
Is there a better, more efficient way to achieve this result? I've heard nested loops are bad for your health. Plus, my data matrix happens to be huge, so iterating with for loops is very slow.
Example
Suppose data is the matrix [1, 2; 3, 4] and percentage is 25%.
Then I would like the output to be (for example) data = [1, 2; 0, 4] and test_list = [(1, 0, 3)]
Upvotes: 0
Views: 61
Reputation: 44828
Here's what you can do:
def remove_randomly(data, percent):
np.random.shuffle(data)
roll = np.random.randint(1, 100, data.shape) # array of random integers with the same shape as data
indices = np.where(roll > percent) # indices of elements in `roll` that are greater than the percentage
test_list = data[indices]
data[indices] = 0
return indices, test_list # return indices and the values
Note that np.random.randint(1, 100)
will only generate random integers in range [1, 100)
, so that 100% will never be generated.
Upvotes: 2