gamzef
gamzef

Reputation: 37

What is the best efficient way to loop through 2d array in Python

I am new to Python and machine learning. I can't find best way on the internet. I have a big 2d array (distance_matrix.shape= (47, 1328624)). I wrote below code but it takes too long time to run. For loop in for loop takes so time.

distance_matrix = [[0.21218192, 0.12845819, 0.54545613, 0.92464129, 0.12051526, 0.0870853 ], [0.2168166 , 0.11174682, 0.58193855, 0.93949729, 0.08060061, 0.11963891], [0.23996999, 0.17554854, 0.60833433, 0.93914766, 0.11631545, 0.2036373]]
                    
iskeleler = pd.DataFrame({
    'lat':[40.992752,41.083202,41.173462],
    'lon':[29.023165,29.066652,29.088163],
    'name':['Kadıköy','AnadoluHisarı','AnadoluKavağı']
}, dtype=str)

for i in range(len(distance_matrix)):
    for j in range(len(distance_matrix[0])):
        if distance_matrix[i][j] < 1:
            iskeleler.loc[i,'Address'] = distance_matrix[i][j]
        
print(iskeleler)

To explain, I am sharing the first 5 rows of my array and showing my dataframe. İskeleler dataframe distance_matrix

The "İskeleler" dataframe has 47 rows. I want to add them to the 'Address' column in row i in the "İskeleler" by looking at all the values in row i in the distance_matrix and adding the ones less than 1. I mean if we look at the first row in the distance_matrix photo, I want to add the numbers like 0.21218192 + 0.12845819 + 0.54545613 .... and put them in the 'address' column in the i'th row in the İskeleler dataframe.

My intend is to loop through distance_matrix and find some values which smaller than 1. The code takes too long. How can i do this with faster way?

Upvotes: 2

Views: 1071

Answers (1)

Mark Setchell
Mark Setchell

Reputation: 207425

I think you mean this:

import numpy as np

# Set up some dummy data in range 0..100
distance = np.random.rand(47,1328624) * 100.0

# Boolean mask of all values < 1
mLessThan1 = distance<1

# Sum elements <1 across rows 
result = np.sum(distance*mLessThan1, axis=1)

That takes 168ms on my Mac.

In [47]: %timeit res = np.sum(distance*mLessThan1, axis=1)
168 ms ± 914 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Upvotes: 2

Related Questions