Reputation: 147
I need to run a for loop through 2 arrays of different lengths. One array is 8760 by 1 and the other is 10 by 1. If a value in the short array is equal to to the index of a value in the long array, I don't want to change anything. If the index of a value in the long array isn't equal to a value in the short array, I want to set it equal to zero. I know the code I have is wrong, but it's a start. I couldn't attach the longer array, but it could be random values for now.
I = np.array([4993, 4994, 4995, 5016, 5017, 5018, 5019, 5042, 5043, 5066])
import numpy as np
A = np.loadtxt('A.txt')
I = np.loadtxt('I.txt')
for i in A:
for j in I:
if A[j] != I[j]:
i = 0
Upvotes: 1
Views: 2343
Reputation: 25093
The general idea of numpy
being "no loops" I'd like to show how it
is possible to perform the task proposed by the OP w/o any (explicit)
loop.
We are going to use the extended addressing capabilities of numpy
to let it deal, at its speed, with the data manipulation details.
To present an example, I need to have some data
and a list, or
vector, here named persisting
, of the indices corresponding to those
values that are requested to persist in the modified data
array at
the end of our procedure.
import numpy as np
data = np.arange(10)
persisting = [4,8,1,0]
Given these preliminaries, we can compute an held
array that holds
all the data
elements that we want to persist, indexing the data
array using the persisting
array
held = data[persisting]
The data
array can be filled with zeros at full speed using the
array method
.fill()
and eventually the elements saved in held
can be restored into their
original places, using again extended addressing, this time on the left
of the assignment.
data.fill(0)
data[persisting] = held
print(data) #>>> [0 1 0 0 4 0 0 0 8 0]
The procedure sketched above may or may not be faster than other
approaches depending on the len()
of the array you are manipulating
and on how much of it, respectively, you are zero-filling rather than
keeping the previous value. If yours is a production problem you have
to consider benchmarking the diverse approaches that you've been
suggested.
I have timed the methods proposed in the accepted answer and in mine. In the following the transcript of an IPython session that documents my procedure.
The results printed are the length of the data array, the length of the indices array, the time (in seconds) that is used (averaged over 7 repetitions), to execute each one of the two functions.
In [38]: import numpy as np
In [39]: def accepted(A, I):
...: good_elements_indices = I
...: all_elements = A
...: for all_elements_index in range(len(all_elements)):
...: if all_elements_index not in good_elements_indices:
...: A[all_elements_index] = 0.0
...:
In [40]: def alternate(a, i):
...: held = a[i]
...: a.fill(0.0)
...: a[i] = held
...:
In [41]: for length in (100, 10000, 1000000):
...: a = np.arange(length)
...: for remain in (10, 100, 10000, length//2):
...: if remain < length:
...: i = np.random.choice(length, remain)
...: acc_t = %timeit -q -o accepted(a, i)
...: alt_t = %timeit -q -o alternate(a, i)
...: print('%10d, %10d: %e, %e;'%(
...: length, remain, acc_t.average, alt_t.average))
...:
100, 10: 1.506336e-04, 1.126085e-06;
100, 50: 1.663167e-04, 1.385859e-06;
10000, 10: 1.539412e-02, 5.308621e-06;
10000, 100: 2.198021e-02, 6.056105e-06;
10000, 5000: 2.995333e-01, 3.775863e-05;
1000000, 10: 1.524685e+00, 1.596268e-03;
1000000, 100: 2.187460e+00, 1.599069e-03;
1000000, 10000: 7.067548e+01, 1.770094e-03;
^C---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
Note that I had to interrupt the timing procedure because %timeit
computes the average over seven repetitions, meaning that the last row took more than 8' to complete...
I have to say that the accepted answer could be sped up quite easily, but I think that having to deal with loops and tests it will be however slower.
Upvotes: 1
Reputation: 5383
Main principle
When you need a for
loop to walk across the indices of an array, use a "counted" loop-- a loop that iterates across a set of integers. Use for index in range(len(
your list)
.
Your specific problem
Given your I
, I think you are asking to set all values of A
to 0
(e.g. assign A[5] = 0
) except, for example, A[4993]
will be unchanged, and so on for indices in I
.
good_elements_indices = I
all_elements = A
for all_elements_index in range(len(all_elements)):
if all_elements_index not in good_elements_index:
A[all_elements_index] = 0
Additional comments
I
and A
. See PEP 8: Python Style Guidein
operator is core Python. Because you're already importing numpy
and your good_elements_index
aka I
is already a numpy.array
object, it is faster but less general to use numpy.isin
function as suggested by mrcl
.for
loop but two for
loops, nested, as your question's code shows. The in
operator actually iterates like a for
loop across the list testing for existence within the set.Upvotes: 1