david
david

Reputation: 1265

How do I assign label to array rows?

I have an array A with size 600x6 that each row is a vector and I want to calculate the distance of each row from all other rows of the array. calculating the distance ( BD distance) is easy and I can calculate all the distances and put them in a matrix D(600x600), but during my code, I have just the value of the row not the index of it and so I cannot use D to find the distance quickly. so I have to calculate the distance again. my question is it a way to assign a label or index to each row of A during the code? for example, I have A1 and A2 so I very fast find out that I have to extract D1,2 for distance. I am not very familiar with python. Could you please tell me how can I do this without calculating the distance each time? as you can see in the following code, the centroid during the next step of the code will change. so I have to calculate the BD distance again which is time-consuming. but if I could save the index of centroid I could extract the distance from my distance matrix very fast.

def kmeans_BD(psnr_bitrate,K,centroid):
    m=psnr_bitrate.shape[0]#number of samples
    n=psnr_bitrate.shape[1]#number of bitrate
    
    # creating an empty array
    BD=np.zeros((m,K))
    #weight of BD_rate
    wr=0.5
    #weight of BD_Q
    wq=0.5
    n_itr=10
    # finding distance between for each centroid
    for itr in range(n_itr):
        for k in range(K):
            for i in range(len(psnr_bitrate)):
                BD_R=bd_rate(rate,centroid[k,:],rate,psnr_bitrate[i,:])
                if BD_R==-2:
                    BD_R=np.inf
                BD_Q=bd_PSNR(rate,centroid[k,:],rate,psnr_bitrate[i,:])
                if BD_Q==-2:
                    BD_Q=np.inf
                BD[i,k]=np.abs(wr*BD_R+wq*BD_Q)

Upvotes: 1

Views: 132

Answers (1)

user7711283
user7711283

Reputation:

This answer is an updated one implementing all the appreciated remarks made in the comments about the problems with implementing the before provided code.

The getIndex() function is the core of the provided solution requested in the question and should now work with all possible array types (Python list, numpy ndarray, sympy Array, ...). It uses different methods for getting the array index while given a value for an array item. If no for the datatype specialized way is available the index will be found using a loop with Python all() function.

To demonstrate the functionality the code comes with a getDistance() function and an example of array data. The assert statements in the code assure that the code works as expected:

def getDistance(vector_1, vector_2, vector_matrix_A, distance_matrix_D):
    try: 
        distance = distance_matrix_D[
            getIndex(vector_matrix_A, vector_1)][
            getIndex(vector_matrix_A, vector_2)]
        return distance
    except:
        print("getDistance() exception, returning None")
        return None

def getIndex(vectorArray, vector, verbose=True):
    if isinstance(vectorArray, list) and isinstance(vector, list):
        if verbose: print('list.index()')
        return vectorArray.index(vector)
    try: 
        import numpy
        if isinstance(vectorArray, numpy.ndarray) and isinstance(vector, numpy.ndarray):
            indx, = numpy.where(numpy.all(vectorArray==vector, axis=1))
            if verbose: print('numpy.where()')
            return indx[0]
    except:
        pass # no numpy
    for indx, item in enumerate(vectorArray):
        try: 
            if vector == item:
                if verbose: print('if vector == item')
                return indx
        except: 
            if all( vector[i] == item[i] for i in range(len(vector))):
                if verbose: print('if all()')
                return indx
    return None

A = [ [i*item for i in (range(1,4))] for item in range(1,7)]
assert A == [[1, 2, 3], [2, 4, 6], [3, 6, 9], [4, 8, 12], [5, 10, 15], [6, 12, 18]]
D = []
for row in range(6):
    column = []
    for colval in range(1+6*row,7+6*row):
        column.append(colval)
    D.append(column)
assert D == [
              [ 1,  2,  3,  4,  5,  6], 
              [ 7,  8,  9, 10, 11, 12], 
              [13, 14, 15, 16, 17, 18], 
              [19, 20, 21, 22, 23, 24], 
              [25, 26, 27, 28, 29, 30], 
              [31, 32, 33, 34, 35, 36],
            ]
vector_3 = A[3]
vector_5 = A[5]
assert getDistance(   vector_3,    vector_5,    A, D) == 24

import numpy
np_A        = numpy.array(A)
np_vector_3 = numpy.array(vector_3) 
np_vector_5 = numpy.array(vector_5) 
assert getDistance(np_vector_3, np_vector_5, np_A, D) == 24

import sympy
sp_A        = sympy.Array(A)
sp_vector_3 = sympy.Array(vector_3) 
sp_vector_5 = sympy.Array(vector_5) 
assert getDistance(sp_vector_3, sp_vector_5, sp_A, D) == 24

Upvotes: 2

Related Questions