Bobby Stiller
Bobby Stiller

Reputation: 147

Running a for loop through 2 arrays of different lengths and index manipulation

I need to run a for loop through 2 arrays of different lengths. One array is 8760 by 1 and the other is 10 by 1. If a value in the short array is equal to to the index of a value in the long array, I don't want to change anything. If the index of a value in the long array isn't equal to a value in the short array, I want to set it equal to zero. I know the code I have is wrong, but it's a start. I couldn't attach the longer array, but it could be random values for now.

I = np.array([4993, 4994, 4995, 5016, 5017, 5018, 5019, 5042, 5043, 5066])

import numpy as np

A = np.loadtxt('A.txt')
I = np.loadtxt('I.txt')

for i in A:
    for j in I:
        if A[j] != I[j]:
            i = 0

Upvotes: 1

Views: 2343

Answers (2)

gboffi
gboffi

Reputation: 25093

The general idea of numpy being "no loops" I'd like to show how it is possible to perform the task proposed by the OP w/o any (explicit) loop.

We are going to use the extended addressing capabilities of numpy to let it deal, at its speed, with the data manipulation details.

To present an example, I need to have some data and a list, or vector, here named persisting, of the indices corresponding to those values that are requested to persist in the modified data array at the end of our procedure.

import numpy as np
data = np.arange(10)
persisting = [4,8,1,0]

Given these preliminaries, we can compute an held array that holds all the data elements that we want to persist, indexing the data array using the persisting array

held = data[persisting]

The data array can be filled with zeros at full speed using the array method .fill() and eventually the elements saved in held can be restored into their original places, using again extended addressing, this time on the left of the assignment.

data.fill(0)
data[persisting] = held
print(data) #>>> [0 1 0 0 4 0 0 0 8 0]

The procedure sketched above may or may not be faster than other approaches depending on the len() of the array you are manipulating and on how much of it, respectively, you are zero-filling rather than keeping the previous value. If yours is a production problem you have to consider benchmarking the diverse approaches that you've been suggested.

Benchmarking

I have timed the methods proposed in the accepted answer and in mine. In the following the transcript of an IPython session that documents my procedure.

The results printed are the length of the data array, the length of the indices array, the time (in seconds) that is used (averaged over 7 repetitions), to execute each one of the two functions.

In [38]: import numpy as np

In [39]: def accepted(A, I):
    ...:     good_elements_indices = I
    ...:     all_elements = A
    ...:     for all_elements_index in range(len(all_elements)):
    ...:         if all_elements_index not in good_elements_indices:
    ...:             A[all_elements_index] = 0.0
    ...:             

In [40]: def alternate(a, i):
    ...:     held = a[i]
    ...:     a.fill(0.0)
    ...:     a[i] = held
    ...:     

In [41]: for length in (100, 10000, 1000000):
    ...:     a = np.arange(length)
    ...:     for remain in (10, 100, 10000, length//2):
    ...:         if remain < length:
    ...:             i = np.random.choice(length, remain)
    ...:             acc_t = %timeit -q -o  accepted(a, i)
    ...:             alt_t = %timeit -q -o alternate(a, i)
    ...:             print('%10d, %10d: %e, %e;'%(
    ...:                    length, remain, acc_t.average, alt_t.average))
    ...:                    
       100,         10: 1.506336e-04, 1.126085e-06;
       100,         50: 1.663167e-04, 1.385859e-06;
     10000,         10: 1.539412e-02, 5.308621e-06;
     10000,        100: 2.198021e-02, 6.056105e-06;
     10000,       5000: 2.995333e-01, 3.775863e-05;
   1000000,         10: 1.524685e+00, 1.596268e-03;
   1000000,        100: 2.187460e+00, 1.599069e-03;
   1000000,      10000: 7.067548e+01, 1.770094e-03;
^C---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)

Note that I had to interrupt the timing procedure because %timeit computes the average over seven repetitions, meaning that the last row took more than 8' to complete...

I have to say that the accepted answer could be sped up quite easily, but I think that having to deal with loops and tests it will be however slower.

Upvotes: 1

Bennett Brown
Bennett Brown

Reputation: 5383

Main principle

When you need a for loop to walk across the indices of an array, use a "counted" loop-- a loop that iterates across a set of integers. Use for index in range(len(your list).

Your specific problem

Given your I, I think you are asking to set all values of A to 0 (e.g. assign A[5] = 0) except, for example, A[4993] will be unchanged, and so on for indices in I.

good_elements_indices = I
all_elements = A
for all_elements_index in range(len(all_elements)):
    if all_elements_index not in good_elements_index:
        A[all_elements_index] = 0

Additional comments

  • Python style uses variable names that are lower case with underscores between words and no abbreviations. Hence I renamed I and A. See PEP 8: Python Style Guide
  • The in operator is core Python. Because you're already importing numpy and your good_elements_index aka I is already a numpy.array object, it is faster but less general to use numpy.isin function as suggested by mrcl.
  • Your question talked about "running a for loop through 2 arrays." That suggests not one for loop but two for loops, nested, as your question's code shows. The in operator actually iterates like a for loop across the list testing for existence within the set.

Upvotes: 1

Related Questions